Domain

Specificity of the domain is the key part of the approach. Machine learning algorithms are far from being ready to answer general questions. Even though the algorithms exist, they are far from our reach. For simplicity, we have to narrow the domain. Usually when we are in a need for introducing a search form, we operate within some area, i.e. finding a car at ebay.com we specify the type of the car, year of production, manufacturer, model etc..

For the sake of this article, let's take the domain of film screenings, i.e. tuple of: movie, theater and the date and time. We may also assume we know the geographical location of the theater and genre of the movie. We are going to query the film screening by title (or just genre), theater name (or just location) and date and time that the screening is going to happen - that is the data we need to collect from the user. Sample expressions we'd like to handle:

the martian in san francisco tomorrow - query all theaters in San Francisco that will play The Martian tomorrow,

the revenant in amc next week - query AMC theater for all shows of The Revenant that will occur next week,

cinemark next wednesday - query Cinemark theaters for all the shows that will occur next Wednesday,

drama in san francisco on 11 June - query all theaters in San Francisco for a drama that will occur on 11th of June.

All expressions are transformed from natural language to a query of the form: