Ever wondered how ChatGPT works? The short answer to this complex question lies in Large Language Models or LLMs which are foundational models that are trained using large amounts of textual data. These models do not process words as humans do. They instead use a long series of numbers, representing a single word. This data is fed to computers in the form of Word Vectors.
These sequences of numbers are known as Word Vectors and can be imagined as a single point in an imaginary space, with words that have similar meaning placed closer to each other. The scale of each model is massive and almost impossible to envision, but for reference, GPT4 has a staggering 1.76 trillion parameters, with millions of unique word vectors, according to a June 28, 2023 report by SemiAnalysis, a US-based independent AI research and analysis company. Processing such a huge number of vectors with trillions of parameters has been possible due to the dramatic advancement of computing power over the last few years. Most recently, on June 19, Nvidia became the largest public company in the world based on market capitalisation, surpassing Microsoft and Apple, as a result of surging demand on their AI capable chipsets.
ChatGPT, Google Gemini and Meta AI are all LLMs that work by predicting the next word, using word vectors. This prediction is done by transforming word vectors fed by the user as "prompts" into predictions, using Transformers.
How Is Text Prediction Done In LLMs?
LLMs are multi-layered. Each layer consists of a neural network architecture (imagine artificial neurons) known as transformers. These transformers process the input text - each word vector individually - and inside each transformer, words in the form of vectors look around and interact for relevant information. This process is repeated over and over again, not just for a single prompt, but even for the next time a user feeds a prompt with similar words into the LLMs. This enables efficiency in the future searches for better prediction of "the next word' in the sequence.
How Are LLMs Trained?
LLMs are trained using unsupervised learning, eliminating the need for human labelling of data. Data from web pages, books, and other textual sources is used to feed LLMs before going public. These have also courted controversy as it reflected human biases in some cases. Most notably, Microsoft's Twitter Chatbot Tay, Google's Gemini and OpenAI's Sora (text-to-video converter) have courted controversy over the years for giving bigoted, racial and gender discriminatory responses. To its credit, the industry has responded to the challenge and is constantly evolving to negate human biases from LLMs.