Back in 2007, I created a rhyming engine based on the public domain Moby pronouncing dictionary. It simply reads the dictionary and looks for rhyming words by comparing the suffix of the words' pronunciations. Since that time, I have made some improvements.

Using a comnbiation of techniques from artificial intelligence, math, and linguistics, the rhyming engine can now figure out how to say any word that you enter. That means if you enter a word that is not in the dictionary, it will still be able to find some rhymes.

Rather than looking for technically perfect rhymes, it suggests words that would sound good together in song or poetry. For example, we sometimes ignore consonants, as suggested by this 1985 paper. That way, fervently will rhyme with urgently despite the v/g mismatch.

There is a legal advantage to this technique as well. Many of the standard word lists used by natural language processing researchers include words from an old edition of the Oxford dictionary, and so cannot be used for "commercial purposes". That's why both Rhymezone and Write Express have a relatively limited dictionary size. My rhyming engine can sidestep this issue, since it only needs to be seeded with a small number of words from unrestricted sources, and it can then import words in bulk, and guess the pronunciations without using any restricted content.

I couldn't resist doing some premature optimization. It uses one of my favourite data structures -- the trie. The program starts, reads the entire 260,000 word database, and completes in 60 ms on my netbook web server. It takes about 8 MB of memory. I guess that equates to about 0.48 mega-byteseconds per request.

Why is this hard?

Further reading

Text to speech for English is still a hard problem to solve, and it is an active area of research. Consider the words rough, through, bough, thought, dough, cough, or photOgraph, photOgraphy, or physics, lymphatic, and loophole. In the 80's, and still today in many cases, text to speech is done by hiring specially trained linguists to develop the thousands of rules necessary to create pronunciations. It is only in the last 10 years or so that this task has been automated. My system has over 200,000 hints on how to interpret each part of a word given its context. With further refinements, this could probably be reduced to tens of thousands, which is still a lot.