-
Notifications
You must be signed in to change notification settings - Fork 0
Brief Introduction to Language Modeling
The term language model according to [@ponte1998lm] is a probability distribution that captures the statistical regularities of the generation of language.
In other words, a language model 'assigns' a probability to a sequence of words. A good language model will be likely to assign a higher probability to the sequence of words that are likely to appear together. For example, in a context of Java source code:
$$ P(public static void main
)>P(public static void =
) $$
As a result
Since approximation of long words sequences may be expensive and require a lot of data, N-gram approach is normally used to approximate these probabilities [@langmodels99].
N-gram approach makes an assumption that the occurrence of any word in a sequence depends only on N-1 previous tokens.
Thus, for example, for N=1, this assumption means that all the words occur independently. Clearly, this approach is not good because it does not take into account the context of the words at all. For the context of the words to be considered N should be equal to 2, 3 or more.
For example, for 3-gram model the probability of the occurrence of the n-th word can be written as following: