WHY IT IS BECOMING DIFFICULT FOR OTHER COMPANIES TO COMPETE WITH CHATGPT?

REASONS WHY IT IS BECOMING DIFFICULT FOR OTHER COMPANIES TO COMPETE WITH CHATGPT:

After reading this article, you will understand how much effort has been put into creating chatgpt. And why it is becoming difficult for other companies to compete with chatgpt. And if they want to compete in the future, how much effort will they have to put in. Friends, this is the rlhf method, reinforced learning with human feedback.

There are three main steps in this method.

1. Supervised fine-tuning model.

2. Reward model.

3. Reinforced learning model.

Let's understand these 3 steps one by one.

In last article [What is AI chatbots and How chatgpt was trained] I told you about gpt-3, how 570 GBs of data was fed to it for training. Which included thousands of books, Wikipedia pages, and web pages. OpenAI knew that this dataset is already very large. If they wanted to improve their model, they didn't need to feed more data. They decided to use the existing data to improve its responses. They needed to fine-tune the model. Because 570 gbs of data in text format was irreplaceable. You will get so much variety in that data that there would be hardly anything that would be not mentioned.

Supervised fine-tuning model

So OpenAI tried to fine-tune the gpt-3 model. With the help of which, gpt-3.5 was created after fine-tuning.so the training data is the same in gpt-3 and 3.5. just the coding of gpt-3.5 is different. For fine-tuning it, they hired 40 contractors. Their job was to create a supervised training dataset, to create higher quality inputs and outputs. They looked at chatgpt 3's performance, when someone asked chatgpt a question, how did chatgpt-3 answer it? They looked for ways to make the answers better.

Their job was to look at each answer individually and then type it in manually, as a sample for what should be the correct language for the answer.so known outputs were created manually for many inputs. And they were matched with each other. This entire process was very time-consuming, tediously slow and quite a bit expensive. Approximately 13,000 sets of inputs and outputs were created. And then it was fed back to the chatgpt to show the computer how it needs to respond to those types of inputs.

Also Read: How Chat-GPT Use Artificial Intelligence and Machine Learning?

The Ai had to recognize the pattern in the new data, and maintain the language. With the help of this, a human-like chatting interface was created since so many people worked hard for this. But this was just step one. After this, came the reward model.to improve the answers generated by chatgpt.

Reward system

A reward system was created by its programmers. When chatgpt answered a question, that question was asked again and again. And multiple answers were generated each time.so, the programmers generated multiple answers for the same question. Then, the people they had hired were put to the tedious task of ranking each distinct answer. Each person had to rank the distinct answers generated by chatgpt for a question, they had to mark the best answer, second best, third best, and the worst.

This ranking was done manually. And due to this ranking, every answer was scored. And this score was the reward. This created a new ranked dataset wherein there were multiple answers By chatgpt for every question. And among those multiple answers, the best answer was selected.

A new model was trained after that. The reward models. Now the computer was told to look at the answers based on their scores the ai had to provide the answer which had the highest score. The one with the highest reward. By doing this, the quality of answers provided by chatgpt improved.

Reinforced Learning Model

Then came the third step, reinforced learning model. In this step, the computer was taught to reward itself. It had to reward the answers it was creating based on the pattern provided to it. and based on this reward, it had to generate another answer, which would deserve the highest reward. Here, a specific algorithm was used, proximal policy optimization, also known as the ppo model.

The equations involved here are very complicated, but i am simplifying and explaining the principle for you. But i am simplifying and broadly speaking the model that's been created Broadly speaking, is very similar to the interaction of a real-life student and teacher.

In the first step a teacher explains to the student how to study a lesson. In the second step, the teacher tests the student and the test is graded. To teach the student which answers were right and which were wrong. And in the third step, the student becomes intelligent enough to be able to grade his tests himself.to and improve himself.

So, using human guidance, we gradually taught this computer what is right and what is wrong and how to reply like a human while chatting. This is where this name came from, friends. Reinforced learning with human feedback. And this is the reason why it is not easy for other companies to beat chatgpt because they will need to do all this hard work. People will have to sit at work, analyze each output one by one, rank them, grade them. And slowly, since it is such a slow and time-consuming process, their Ai would improve.

Then, in march 2023, OpenAI introduced its latest model, chatgpt-4. The number of parameters used in it has not been revealed publicly. But it is said that trillions of parameters have been used. We do not know how much training data has been fed to it. But a drastic improvement can be seen between chatgpt pt-3.5 and version 4.0.

I have seen it personally. After testing both models many times. Even though this is a revolutionary technology, it is very important to keep its limitations in mind. I will talk about this in a chapter at the end. But for now, i would like to tell you that you have now seen, the innovative ways this technology was crafted, and what it means. ChatGPT is not capable of providing a 100% perfect answer to everything, every time. There are some shortcomings. And these shortcomings were caused by this process.

Because so many people were tasked with training it and undoubtedly, each person had their biases. Ranking an output as being better than another giving the best rank to an output, depends upon each person. In someone's opinion, an output might be the best. In someone else's opinion, another output might be the best.

 Also Read: GPT-4 vs ChatGPT Which one is more powerful?

So, the method of creating the reward model cannot be inherently perfect because it was created by humans. And the biases and opinions of those people are reflected in the chatgpt as well. The rlhf method makes a major assumption. The assumption being that all the people in the world have the same opinion on some things. Which doesn't actually happen.


In the end, this model was evaluated based on 3 criteria. Helpfulness, truth truthfulness, and harmlessness. Helpfulness, truthfulness, and in my opinion, 99% of the time, the 4.0 version of chatgpt gives objective answers.it doesn't have any evident biases or an inability to give opinions. But there are some 1% of cases where these minor biases are reflected in its answers.so it's difficult task to feed large number of data and take feedback from different people and hire different programmer that can implement.

Post a Comment

Previous Post Next Post