Spell checking is one of those problems that is already solved... sorta.

Like all problems it really depends on context. Take Jon Bentley's Programming pearls: a spelling checker where he examines the problem space and the differences between a spell checker and a spelling corrector. I start by searching the keyword 'spell' across all of CPAN.

wget http://www.cpan.org/modules/01modules.index.html

ack -i spell 01modules.index.html



The above covered all 22,442 distribution names but not the sub modules names. A few metacpan searches later and I was able to compile the following list.

Direct checkers - modules that actually do the spell checking





Lingua::Ispell A module encapsulating access to the Ispell program via IPC::Open2



Meta::Tool::Aspell run aspell for you. Meta is a class library of about 250 classes and is abandonware.



Text::Aspell Perl interface to the GNU Aspell library



Text::Hunspell Perl interface to the GNU Hunspell library



Text::Ispell A wrapper module for Ispell. The ispell cli is called via IPC::Open2.



Indirect - relies on another module to do the actual checking





Search::Tools::SpellCheck Uses Text::Aspell to offer spelling suggestions



Text::SpellChecker OO interface for spell-checking a block of text. Uses either Text::Aspell or Text::Hunspell



POD only checkers





Pod::Spell::CommonMistakes Catches common typos in POD by using Pod::Spell to format the text and then comparing it against a custom wordlist from Pod::Spell::CommonMistakes::WordList. No system spell checker is required.



Pod::Spelling Send POD to a spelling checker using either Lingua::Ispell or Text::Aspell. A test library is provided via Test::Pod::Spelling



Test::Spelling check for spelling errors in POD files. Pod::Spell is used for parsing and an open3 call is made to either 'spell', 'aspell', 'ispell', or 'hunspell' for spell checking.



XML





Apache::AxKit::Language::SpellCheck is an XML Text Spell Checker for the Apache AxKit. Checking is done via Text::Aspell



xml_spellcheck is a cli application for spell checking XML files. It makes a system call to 'aspell -c' directly.



Spell checking as a test





Dist::Zilla::Plugin::PodSpellingTests is DEPRECATED! The old name of the PodSpelling plugin



Dist::Zilla::Plugin::SpellingCommonMistakesTests Generates a Test::Pod::Spell::CommonMistakes release test



Test::Pod::Spelling::CommonMistakes Checks POD for common spelling mistakes using Pod::Spell::CommonMistakes.



Dist::Zilla::Plugin::Test::PodSpelling Generates a Test::Spelling author test



Perl::Critic::Policy::Documentation::PodSpelling Spell check the POD. Aspell is used via an open command.



Checks spelling via remote service/application





Bing::Search::Source::Spell uses Bing to spell check text.



Lingua::AtD Provides an OO wrapper for After the Deadline grammar and spelling service.



Lingua::MSWordSpell Uses Microsoft Word's Spellchecker over OLE automation instead of something like ispell



Net::Google::Spelling simple OOP-ish interface to the Google SOAP API for spelling suggestions. This appears abandoned based on last update date, number of open bugs and the fact it has more failed test reports than passes.



WebService::KoreanSpeller A Korean spell checker



Everything else





Gtk2::Spell Perl bindings to GtkSpell, used in concert with Gtk2::TextView.



Lingua::Jspell Perl interface to the Jspell morphological analyzer.



Lingua::Spelling::Alternative Use affix files generated by the ispell tools to return alternative spellings of a given word



Pod::Spell a formatter for spell checking Pod, no actual checking capabilities built in.



Text::SpellChecker::GUI Implements a user interface to Text::SpellChecker



Tie::Ispell Ties a hash with an Ispell dictionary



tkispell Perl/Tk user interface for Ispell



While many of these modules are actively developed and useful many do not fit my requirements for this project. I want to spell check any kind of utf8 encoded text and not need an Internet connection or closed source program to accomplish this task. The first two groups direct and indirect spell checkers appear to meet these requirements.

So which one to use? Lets take a look under the hood. GNU Ispell gives spelling suggestions if a word is not found in its dictionary. When searching for possible corrections to present it uses a Damerau–Levenshtein distance of 1.

GNU Aspell is an Ispell replacement that can handle utf-8 by default, has 70 supported language dictionaries, and supports using multiple dictionaries at once.

Hunspell is an advanced spell checker based on MySpell that supports both dictionaries and rules. It is currently used by LibreOffice/OpenOffice and has dictionaries for 99 languages.

It looks like Aspell and Hunspell are the front runners. I rejected Meta::Tool::Aspell because it has not been updated since 2002 and the distribution it is a part of has tons of other modules that are not needed for this problem. That leaves Text::Aspell, Text::Hunspell, Search::Tools::SpellCheck, and Text::SpellChecker left. Now before I started trying these modules out, I decided to try the command line versions of aspell and hunspell against some test data and to compare the output.

Long story short they both generate too much noise due to how they function. If a word is in the dictionary or it can be matched then everything works out. Since I am dealing with text of any kind from source code files to memos and email there is too much noise. Things like company names, places, peoples' names, animals, plants, email addresses, and technical terms can easily be flagged as incorrect.

I wrote a few sample programs to try and work around this problem. I will cover the results of my research in a future post.