What Do We Mean By “Large Language Models”?

For decades, artificial intelligence has been used to perform specific functions, like self-driving a Tesla or auto-processing an insurance claim. These single purpose AI and machine learning technologies sit silently behind the scenes, just doing their thing, with little awareness from the likes of you and I that they even exist.

Then along came ChatGPT.

And a new generation of artificial intelligence had arrived. This is a new, different and disruptive form of artificial intelligence. It is different to the AI that went before it for two fundamental reasons.

First, it’s a more general, multipurpose form of AI and second, anyone can use it however they want, for whatever they want.

Known as “generative AI,” this new breed of artificial intelligence is all about the power of words. Even if it’s a picture, generative AI turns it into words. The super power of generative AI is that its use of language is on par with human levels of communication (which is not to be confused with human levels of comprehension and understanding.)

Quite simply, generative AI is a statistical prediction machine that’s known as an “LLM” or Large Language Model, because that’s exactly what it is! Every LLM has read millions of pieces of human writing from across the internet, and broken every sentence and word down into their smallest units or fractions of words, for example, “football” becomes “foot” and “ball.”

The AI then creates a massive multi-dimensional map of the way that words are combined to form many many different meanings. Think of how many ways you can use the words “foot” and “ball” in a different context.

But there’s more to generative AI than just building a mega spreadsheet of word combinations. What unlocked their abilities to write as fluently as they do today is a revolutionary AI tool called “the Transformer.”

The Transformer model was first published by a group of eight AI researchers at Google in June 2017. Before Transformer, AI development focused on trying to replicate the way the human brain works through neural networks.

But the Transformer model changed everything.

➜ Here’s The Thing: Google’s 11-page research paper in 2017 marked the start of the generative AI era. Without it, there would be no ChatGPT!

Transformers Changed Everything

Transformers are the ability of an AI system to process an entire sequence of words, analysing all its parts, breaking it down and “understanding” the meaning of the words. By that I mean that the Transformer is able to work out which parts of a sentence or article are the most important in defining the meaning of it.

A key concept of the Transformer architecture is called “self-attention”. This is what allows LLMs to understand the relationships between words. Self-attention looks at each word, or token, in a body of text and decides how they all relate to each other and which are most important to understanding its meaning.

I asked Anthropic’s Claude to explain how this works using a simple example “the quick brown fox jumps over the lazy dog.” Claude wrote:

This sentence would first be broken down into individual tokens, or words. So the model would see:

[“The”, “quick”, “brown”, “fox”, “jumps”, “over”, “the”, “lazy”, “dog”]

The model then uses self-attention to understand how these words relate to each other. It creates numerical vectors for each word, and maps out their relationships based on context clues. So it would map that “fox” and “jumps” go together, as do “dog” and “lazy”.

With this vector map of word relationships, the model can then start generating new sentences based on its understanding. It might take the core of “fox jumps over dog” and create a new sentence like: “The playful kitten leapfrogged over the napping puppy.”. The model is able to mix words around contextually and correctly due to building up its knowledge from the vector mappings created by self-attention. It can continuously improve at natural language generation the more examples it has to learn from.

I then asked Claude to write a new sentence based on this example. Claude wrote:

Here is a new sentence generated using the same example:

”The speedy silver fox bounded across the dozing hound.”

The model used its vector map created from the first sentence to understand relationships between words like “fox” and action verbs, as well as animals and descriptors.

It was able to mix and match words contextually correctly to form a new plausible sentence, by drawing on its understanding of how certain words and concepts are likely to be used together based on the examples in its training data. This demonstrates how large language models are able to take simple word associations and build up a complex internal representation of language.

The highlighted text above was written by the chatbot Claude using the LLM from Anthropic called Claude 2.

With no editing by me.

About The Author

Rick Huckstep has worked in technology his entire career, as a corporate sales leader, investor in tech startups and keynote speaker. From his home in Spain, Rick is thought leader in artificial intelligence, emerging technologies and the future of work.

🕸️ Check out Rick’s website

🤔 Join The Mailing List and Get Wiser! every week (and your free eBook)

📽️ Follow Rick on YouTube

🎙️ Listen to the Big Tech Little Tech podcast

--

--

Rick Huckstep - Making Sense Of Tech

Supercharge your career with AI - 10x your productivity, prospects and wisdom with tips, tricks, tools and insights about AI and emerging technologies