New Soundex support in NinjaNye.SearchExtensions

I have recently released a new version of NinjaNye.SearchExtensions nuget package. The main feature of this release is the Soundex search support.

PM> Install-Package NinjaNye.SearchExtensions

SearchExtensions is a library of IQueryable and IEnumerable extension methods to help simplify string searching.

What is Soundex

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. [Source: Wikipedia]

As of release 1.1, NinjaNye.SearchExtensions supports converting and searching for words based on the soundex algorithm.

How to: Performing Soundex searches

Search where a single property sounds like a single search term

var result = data.Search(x => x.Property1).Soundex("test")

Search where a any of multiple properties sounds like a single search term

var result = data.Search(x => x.Property1, x => x.PropertyTwo) .Soundex("test")

Search where a single property sounds like any one of multiple search terms

var result = data.Search(x => x.Property1).Soundex("test", "another")

Search where a any of multiple properties sounds like any of multiple search terms

var result = data.Search(x => x.Property1, x => x.PropertyTwo) .Soundex("test", "another")

How to: Combining Soundex searches

Joining soundex searches is conducted in the same way as any other search, and can be combined with any other search (although may not be appropriate)

Search where property1 sounds like term1 AND property2 sounds like a term2

var result = data.Search(x => x.Property1).Soundex("test") .Search(x => x.Property2).Soundex("another")

How to: Converting words to Soundex

As part of this update I created an extension method on string that can be used to convert a word to it's Soundex code. This extension method is public and can be used as you desire outside of the Search() functionality

Producing the Soundex code for a word is simple. Firstly, make sure your are using the Soundex namespace:

using NinjaNye.SearchExtensions.Soundex;

Once you have this you can use the ToSoundex() extension method

string word = "test"; string soundex = word.ToSoundex();

Converting multiple words to soundex codes

string sentence = "the quick brown fox"; string words = sentence.Split(' '); var codes = words.Select(x => x.ToSoundex());

Performance

A lot of the examples I saw whilst researching the subject performed the same task but not always in the most performant way. Because of this I was keen to build something that would scale. Below are the tests I ran during development.

Test environment

All of these test results are from my development machine with the following specification:

Intel Core i5-3317U CPU @ 1.70GHz

10Gb RAM

Windows 8.1 64bit operating system

All of the tests below were performed against 1 million randomly generated words ranging from 2 to 10 characters

Converting words using ToSoundex()

var result = words.Select(x => x.ToSoundex()).ToList();

Time taken: 0.6919661 seconds

Querying words that match 'test'

var result = words.Search(x => x).Soundex("test").ToList();

Time taken: 0.6385429 seconds (618 results)

Querying words that match two words

var result = words.Search(x => x).Soundex("test", "bacon").ToList();

Time taken: 0.4372583 seconds (1285 results)

Querying words that match ten words

var result = words.Search(x => x).Soundex("historians", "often", "articulate", "great", "battles", "elegantly", "without", "pause", "for", "thought").ToList();

Time taken: 0.5831033 seconds (7093 results)

To see a more in depth write up of the performance testing I have done, please see my latest post on the subject

Feature requests

I've had a great time developing this feature and

I'm always open to new ideas so if you have an idea for a feature that you believe would be a good addition to SearchExtensions, please get in touch.

Equally, if you are currently using SearchExtensions and can see areas that could be enhanced or improved, I'd love to hear from you.