A UX problem with search is quick feedback. A user does not know what results she is going to see until she hits Return. But what if we had something like this :

Tokenization while searching

The UI interactions. (Caret position and all those ‘fun’ things)

Maintain a big FSM representing tokeninzation metadata.

Lets talk about the second one first, Most of the current libraries assume that the token information is synchronously available. Now the problem with that is you cannot have the full state machine regarding tokens of a search on the client, depending on the limits of your system it could be too big. Here we need to provide an asynchronous method of fetching the tokenization state from the Server. Which needs to be synchronized and optimized, for eg. this might be an interaction timeline:

Client-Server request reponse

Reactive Streams to the rescue! RxJS is a wonderful library by Microsoft, the above scenario could be easily handled using switchMap from the RxJS. Here is an excerpt from the library tokenizer.js:

switchMap to cancel previous request.

Here we create an Observable from the input stream of change events (which is different for IE) feed it to switch map which calls the async callback and then subscribe to it to re tokenize the display. So, basically switchMap handles for us cancellation of a previous request, and thus always waiting on the response for the latest state of the tokenized input.

Now, let us come to the first part of our puzzle. How to handle the UI interactions which a user could do, maintain the caret position amid incoming server responses about tokenized state and how to handle the browser quirks ? While development of tokenizer.js I stumbled upon rangy, which is a great library to handle cross browser range/caret quirks although I needed to fine tune stuff to get the best performance. Also, here is some excerpt which provides an insight into how could one manage different browsers using ES6 modules, again from the source code of tokenizer.js :

Here the browser specific utility methods/properties (if required) are exported at load time, thus preventing conditions to check the browser type for each interaction. There were other micro optimizations which needed to be handled to have a smooth search experience, including but not limited to:

Keypresses in between tokens. (BckSpc/Delete in particular)

Enter/Esc/Tab behaviour

Handling Space at token endings.

Removal of tokens using (x) button.

Copy/Paste of text from other sources.

Handle IME languages like Japanese/Chinese etc.

These were just some learnings we had while developing tokenizer.js for use in the search bar at ThoughtSpot. Please feel free to checkout the full source code on github. And you can use it in your projects right away via npm.