Unsmoothed Maximum Likelihood Character Level Language Model¶

The name is quite long, but the idea is very simple. We want a model whose job is to guess the next character based on the previous $n$ letters. For example, having seen ello , the next characer is likely to be either a commma or space (if we assume is is the end of the word "hello"), or the letter w if we believe we are in the middle of the word "mellow". Humans are quite good at this, but of course seeing a larger history makes things easier (if we were to see 5 letters instead of 4, the choice between space and w would have been much easier).

We will call $n$, the number of letters we need to guess based on, the order of the language model.

RNNs and LSTMs can potentially learn infinite-order language model (they guess the next character based on a "state" which supposedly encode all the previous history). We here will restrict ourselves to a fixed-order language model.

So, we are seeing $n$ letters, and need to guess the $n+1$th one. We are also given a large-ish amount of text (say, all of Shakespear works) that we can use. How would we go about solving this task?

Mathematiacally, we would like to learn a function $P(c | h)$. Here, $c$ is a character, $h$ is a $n$-letters history, and $P(c|h)$ stands for how likely is it to see $c$ after we've seen $h$.

Perhaps the simplest approach would be to just count and divide (a.k.a maximum likelihood estimates). We will count the number of times each letter $c'$ appeared after $h$, and divide by the total numbers of letters appearing after $h$. The unsmoothed part means that if we did not see a given letter following $h$, we will just give it a probability of zero.

And that's all there is to it.