Free querying allows people to use any words which they know. However, sometimes it is difficult for program to map these words to standard database scales. For example, the color ‘blue’ has more than 400 shades and people use them by querying every day. The solution tp this problem is creating a dictionary with synonyms and using the Synonym Token Filter. In this article we tell you how do this in a few steps.

Description of the method

While stemming helps to broaden the scope of search by simplifying inflected words to their root form, synonyms broaden the scope by relating concepts and ideas. As we said above, one of the many problems that can meet us is using not exactly suitable words in search query. For example, we can have the field name equal to ”blue jacket” in the document and name “cyan jacket” in search query. Using the synonym token filter allows easily handling of synonyms during the analysis process. Synonyms are configured using a configuration file.

Description of the process

1)First, we have to define a token filter of type synonym and add synonym formats in Formatting Synonyms. First way to set a filter:

"filter": { "synonyms_filt": { "tokenizer": "keyword", "type": "synonym", "synonyms": [“blue, Blue, blue-gold, cobalt, dark blue/black/charcoal, duke blue, jade blue, scottsdale blue, blue/white/khaki, pacific blue”, "Gold,diamond/gold,blue-gold,white with gold,gold,golden beige,gold-blue”] } }

And one more way with using a path to synonym.txt :

"filter" : { "synonyms_filt" : { "tokenizer": "keyword", "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } }

HINT: This file analysis/synonym.txt must be in each node of the cluster. File format should be as follows:

# blank lines and lines starting with pound are comments. #Explicit mappings match any token sequence on the LHS of "=>" #and replace with all alternatives on the RHS. These types of mappings #ignore the expand parameter in the schema. #Examples: i-pod, i pod => ipod, sea biscuit, sea biscit => seabiscuit #Equivalent synonyms may be separated with commas and give #no explicit mapping. In this case the mapping behavior will #be taken from the expand parameter in the schema. This allows #the same synonym file to be used in different synonym handling strategies. #Examples: ipod, i-pod, i pod foozball , foosball universe , cosmos # If expand==true, "ipod, i-pod, i pod" is equivalent # to the explicit mapping: ipod, i-pod, i pod => ipod, i-pod, i pod # If expand==false, "ipod, i-pod, i pod" is equivalent # to the explicit mapping: ipod, i-pod, i pod => ipod #multiple synonym mapping entries are merged. foo => foo bar foo => baz #is equivalent to foo => foo bar, baz

Described above configures a synonym filter with a path of analysis/synonym.txt (relative to the config location).

Supergiant By Qbox: The first datacenter total control system that makes it easy to save up to 60% on your AWS bill.

"analyzer": { "filter_synonyms": { "filter": [ "synonyms_filt" ], "tokenizer": "keyword" } }

You can see that we added two synonyms for blue and gold colors. When user searches for “cobalt dress” , elastic will return scope with “cobalt” color synonyms “blue, Blue, blue-gold, dark blue/black/charcoal, duke blue, jade blue, scottsdale blue, blue/white/khaki, pacific blue” . For example, you can create a query and test it without synonyms and then add our analyzer "filter_synonyms" .

3) Let’s test our custom stopwords filter:

curl'https://localhost/index_name4/_search?pretty' -d '{ "query": { "filtered": { "query": { "multi_match": { "query": "blue dress", "fields": ["name","description", "brand", "color"], "analyzer": "filter_synonyms" } } } } }'

"hits" : { "total" : 36165, "max_score" : 1.1498904 } curl'https://localhost/index_name4/_search?pretty' -d '{ "query": { "filtered": { "query": { "multi_match": { "query": "scottsdale blue dress", "fields": ["name","description", "brand", "color"], "analyzer": "filter_synonyms" } } } } }'

As the second:

"hits" : { "total" : 36165, "max_score" : 0.62592113 }

Additional information

Using the same synonym token filter at both index time and search time is redundant. If we replace gold at index time with the two terms gold-blue and Gold, at search time we need to search for only one of those terms. Alternatively, if we don’t use synonyms at index time we would need to convert a query for gold into a query for Gold or gold-blue at search time.