In this tutorial, you will learn how to:

An old and small benchmark for this task is the ATIS (Airline Travel Information System) dataset collected by DARPA. Here is a sentence (or utterance) example using the Inside Outside Beginning (IOB) representation.

Input (words) show flights from Boston to New York today Output (labels) O O O B-dept O B-arr I-arr B-date

The ATIS offical split contains 4,978/893 sentences for a total of 56,590/9,198 words (average sentence length is 15) in the train/test set. The number of classes (different slots) is 128 including the O label (NULL).

As Microsoft Research people, we deal with unseen words in the test set by marking any words with only one single occurrence in the training set as <UNK> and use this token to represent those unseen words in the test set. As Ronan Collobert and colleagues, we converted sequences of numbers with the string DIGIT i.e. 1984 is converted to DIGITDIGITDIGITDIGIT .

We split the official train set into a training and validation set that contain respectively 80% and 20% of the official training sentences. Significant performance improvement difference has to be greater than 0.6% in F1 measure at the 95% level due to the small size of the dataset. For evaluation purpose, experiments have to report the following metrics:

We will use the conlleval PERL script to measure the performance of our models.