In this installment to the analyzer blog series, we will discover how a synonym analyzer works. Synonym search is a very commonly employed and relevant search tool in the world of search analytics. Having the results included in the search will make the search more accurate and reduce search time. Keeping this in mind, Elasticsearch has provided us with the option of synonym token filter. This can be configured in a number of ways so that when a word is indexed, it is also mapped to its synonyms. Either the user can define or it can be loaded from another database.

Synonym Token Filter

Elasticsearch, including hosted elasticsearch, can be made to consider synonyms while searching. That is, if we search the database for a word such as small it can also show us the documents containing the word tiny . For this purpose we can use the synonym token filter. So, let us consider the above case where we make tiny the synonym for small . A synonym analyzer of the name synonymAnalyzer is configured in the below code:

curl -X PUT "<a href="http://localhost:9200/analyzers-blog-04-01">http://localhost:9200/analyzers-blog-04-01</a>" -d '{ "analysis": { "filter": { "synonym": { "type": "synonym", "synonyms": [ "small, tiny" ] } }, "analyzer": { "synonymAnalyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "synonym" ] } } } }'

For testing the working of this analyzer we can use the analyze API as follows:

curl -XPOST 'localhost:9200/analyzers-blog-04-01/_analyze?analyzer=synonymAnalyzer&pretty' -d 'a small rabbit'

Here in the result, we can see the generated tokens as:

a -> position 1 small -> position 2 tiny -> position 2 rabbit -> position 3

You can see that the words small and tiny which were assigned as synonyms and were marked with the same position numbers. In general if any of the words in the synonym list occurs, this filter will replace it with all words in the corresponding synonym list.

Wordnet Synonym Filter

As with the above approach, everything went well but, what if we need to map all the synonyms of the words we have in the database to our inverted index? One cannot simply prepare synonym lists for all the words in our data base.

Elasticsearch has enabled the use of Wordnet (more about Wordnet here 1 , 2) lexical database. Here the Wordnet prolog database is used which deals with the synonym list for English language. Now let us see how to set up the wordnet synonym filter:

Download the Wordnet prolog database from here. After extracting the file, go to the extracted folder and copy the file wn_s.pl . In your Elasticsearch config folder, create another folder named analysis . Paste the wn_s.pl file in the analysis folder we just created.

Now let us create an index which makes use of the synonym database we just added, like below:

curl -XPUT 'localhost:9200/analyzers-blog-04-02' -d '{ "analysis": { "filter": { "synonym": { "type": "synonym", "format": "wordnet", "synonyms_path": "analysis/wn_s.pl" } }, "analyzer": { "wordnet-synonym-analyzer": { "tokenizer": "lowercase", "filter": [ "synonym" ] } } } }'

Here you can see the synonym filter has an extra path where we point to the file where the synonyms are stored. In this case wn_s.pl . Now, let us apply the same example that we used for normal synonym filter here:

curl -XPOST 'localhost:9200/analyzers-blog-04-02/_analyze?analyzer=wordnet-synonym-analyzer&pretty' -d 'a small rabbit'

This yields the token list in the response, which is given in the table below:

The list above shows the three words we gave as input indexed with its synonyms that were taken from the Wordnet database.

Conclusion

In this installment to the analyzer blog series, we learned how a synonym analyzer works. Synonym search is a very commonly employed and relevant search tool in the world of search analytics. Here we have focused on synonym token filters and wordnet. Questions, Comments? Drop us a line in the messages below.