I am using the pattern_capture filter to preserve all the acronyms

PUT test_index/_settings { "index.analysis.filter": { "acronym_en_EN": { "type": "pattern_capture", "patterns": [ "(?:[a-zA-Z]\\.)+", "((?:[a-zA-Z]\\.)+[a-zA-Z])", "((?:[a-zA-Z]\\.)+[s]$)", "((?:[a-zA-Z]\\.)+[s][\\.]$)" ], "preserve_original": true } } }

But i noticed that acronyms that end with s or s. are stemmed as there is one stemmer filter also attached to the analyzer. The regular expressions in the filter above for handling s are also not working.

I test the output using this

GET test_index/_analyze?tokenizer=standard&filters=lowercase,acronym_en_EN,apostrophe,porter_stemmer_en_EN&text=u.s.a. u.s. s.w.a.t u.t.

this gives me

{ "tokens": [ { "token": "u.s.a", "start_offset": 0, "end_offset": 5, "type": "<ALPHANUM>", "position": 1 }, { "token": "u.", "start_offset": 7, "end_offset": 10, "type": "<ALPHANUM>", "position": 2 }, { "token": "u.", "start_offset": 7, "end_offset": 10, "type": "<ALPHANUM>", "position": 2 }, { "token": "s.w.a.t", "start_offset": 12, "end_offset": 19, "type": "<ALPHANUM>", "position": 3 }, { "token": "u.t", "start_offset": 20, "end_offset": 23, "type": "<ALPHANUM>", "position": 4 } ] }