Large language model

language model built with very large amounts of texts From Wikipedia, the free encyclopedia

Remove ads

A large language model (LLM) is a type of artificial intelligence that can understand and create human language. These models learn by studying huge amounts of text from books, websites, and other sources.[1]

How they work

LLMs work by finding patterns in language. They learn grammar, facts, and how words relate to each other by looking at billions of examples. The most powerful LLMs use a special design called "transformer," which helps them process large amounts of text quickly.[2]

Limitations

While LLMs are powerful, they can make mistakes. They sometimes include biases from their training data, and they can produce incorrect information. They learn from existing text rather than having true understanding like humans do.[3]

Remove ads

History

Before 2017, language models were much simpler. The big change came when Google created the "transformer" design, which made language models much more powerful.[4]

Important developments include:

  • 2018: BERT was released, which helped computers better understand language[5]
  • 2019: GPT-2 was created but was considered so powerful that its creators worried about misuse[6]
  • 2022: ChatGPT was released and became very popular with the public[7]
  • 2023: GPT-4 came out and could understand both text and images[8]

Modern developments

Today, there are many different LLMs available. Some are private, like GPT-4, while others are open for anyone to use, like LLaMA and Mistral. As of 2024, GPT-4 was considered one of the most capable language models.[9]

Remove ads

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads