$\begingroup$

Hope to get help from someone experienced with implementation of language models. I am trying to implement n-gram model based on Kneser-Ney smoothing, however met the next issue:

Here is simplified equation for 4-gram Knesey-Ney smoothing computation.

The main purpose of smoothing and interpolation here is to involve into calculation of word's probability the lower levels of given ngram. The next part of equation:

represents the weight for lower ngram in equation, however with high n-grams as trigrams or higher, the sequence of words in test data quite possible will not occur even once in train data. That will make the weight equal to zero and since, all calculations with lower n-grams don't have a sanse. Moreover, it will return a zero probability, even all words in a given ngram are presented in vocabulary.

So, my question is if I understand correctly the model and Kneser-Ney smoothing and how it's possible to manage with zero-counts of particular words' sequences in this case.

Thanks you so much!