Language model

Statistical model of structure of language / From Wikipedia, the free encyclopedia

Dear Wikiwand AI, let's keep it short by simply answering these key questions:

Can you list the top facts and stats about Language model?

Summarize this article for a 10 years old

SHOW ALL QUESTIONS

A language model is a probability distribution over sequences of words.[1] Given any sequence of words of length m, a language model assigns a probability to the whole sequence. Language models generate probabilities by training on text corpora in one or many languages. Given that languages can be used to express an infinite variety of valid sentences (the property of digital infinity), language modeling faces the problem of assigning non-zero probabilities to linguistically valid sequences that may never be encountered in the training data. Several modelling approaches have been designed to surmount this problem, such as applying the Markov assumption or using neural architectures such as recurrent neural networks or transformers.

Language models are useful for a variety of problems in computational linguistics; from initial applications in speech recognition[2] to ensure nonsensical (i.e. low-probability) word sequences are not predicted, to wider use in machine translation[3] (e.g. scoring candidate translations), natural language generation (generating more human-like text), part-of-speech tagging, parsing,[3] optical character recognition, handwriting recognition,[4] grammar induction,[5] information retrieval,[6][7] and other applications.

Language models are used in information retrieval in the query likelihood model. There, a separate language model is associated with each document in a collection. Documents are ranked based on the probability of the query in the document's language model : . Commonly, the unigram language model is used for this purpose.

Since 2018, large language models (LLMs) consisting of deep neural networks with billions of trainable parameters, trained on massive datasets of unlabelled text, have demonstrated impressive results on a wide variety of natural language processing tasks. This development has led to a shift in research focus toward the use of general-purpose LLMs.[8]