OpenAI introduced a long-form question-answering AI called ChatGPT that answers complex questions conversationally.
It’s a revolutionary technology because it’s trained to learn what humans mean when they ask a question.
Many users are awed at its ability to provide human-quality responses, inspiring the feeling that it may eventually have the power to disrupt how humans interact with computers and change how information is retrieved.
What Is ChatGPT?
ChatGPT is a large language model chatbot developed by OpenAI based on GPT-3.5. It has a remarkable ability to interact in conversational dialogue form and provide responses that can appear surprisingly human.
Large language models perform the task of predicting the next word in a series of words.
Reinforcement Learning with Human Feedback (RLHF) is an additional layer of training that uses human feedback to help ChatGPT learn the ability to follow directions and generate responses that are satisfactory to humans.
Who Built ChatGPT?
ChatGPT was created by San Francisco-based artificial intelligence company OpenAI. OpenAI Inc. is the non-profit parent company of the for-profit OpenAI LP.
OpenAI is famous for its well-known DALL·E, a deep-learning model that generates images from text instructions called prompts.
The CEO is Sam Altman, who previously was president of Y Combinator.
Microsoft is a partner and investor in the amount of $1 billion dollars. They jointly developed the Azure AI Platform.
Large Language Models
ChatGPT is a large language model (LLM). Large Language Models (LLMs) are trained with massive amounts of data to accurately predict what word comes next in a sentence.
It was discovered that increasing the amount of data increased the ability of the language models to do more.
How Was ChatGPT Trained?
GPT-3.5 was trained on massive amounts of data about code and information from the internet, including sources like Reddit discussions, to help ChatGPT learn dialogue and attain a human style of responding.
ChatGPT was also trained using human feedback (a technique called Reinforcement Learning with Human Feedback) so that the AI learned what humans expected when they asked a question. Training the LLM this way is revolutionary because it goes beyond simply training the LLM to predict the next word.
A March 2022 research paper titled Training Language Models to Follow Instructions with Human Feedback explains why this is a breakthrough approach:
“This work is motivated by our aim to increase the positive impact of large language models by training them to do what a given set of humans want them to do. By default, language models optimize the next word prediction objective, which is only a proxy for what we want these models to do. Our results indicate that our techniques hold promise for making language models more helpful, truthful, and harmless. Making language models bigger does not inherently make them better at following a user’s intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users.”
What sets ChatGPT apart from a simple chatbot is that it was specifically trained to understand the human intent in a question and provide helpful, truthful, and harmless answers.
Because of that training, ChatGPT may challenge certain questions and discard parts of the question that don’t make sense.
Another research paper related to ChatGPT shows how they trained the AI to predict what humans preferred.
The researchers noticed that the metrics used to rate the outputs of natural language processing AI resulted in machines that scored well on the metrics, but didn’t align with what humans expected.
What are the Limitations of ChatGPT?
Limitations on Toxic Response
ChatGPT is specifically programmed not to provide toxic or harmful responses. So it will avoid answering those kinds of questions.
Quality of Answers Depends on Quality of Directions
An important limitation of ChatGPT is that the quality of the output depends on the quality of the input. In other words, expert directions (prompts) generate better answers.
The moderators at the coding Q&A website Stack Overflow may have discovered an unintended consequence of answers that feel right to humans.
Stack Overflow was flooded with user responses generated from ChatGPT that appeared to be correct, but a great many were wrong answers.
The thousands of answers overwhelmed the volunteer moderator team, prompting the administrators to enact a ban against any users who post answers generated from ChatGPT.
Answers Are Not Always Correct
Another limitation is that because it is trained to provide answers that feel right to humans, the answers can trick humans that the output is correct.
Many users discovered that ChatGPT can provide incorrect answers, including some that are wildly incorrect.