NSLinguistic Tagger is a veritable Swiss Army Knife of linguistic functionality, with the ability to tokenize natural language strings into words, determine their part-of-speech & stem, extract names of people, places, & organizations, and tell you the languages & respective writing system used in the string.

For most of us, this is far more power than we know what to do with. But perhaps this is just for lack sufficient opportunity to try. After all, almost every application deals with natural language in one way or another–perhaps NSLinguistic Tagger could add a new level of polish, or enable brand new features entirely.

Introduced with iOS 5, NSLinguistic Tagger is a contemporary to Siri, raising speculation that it was a byproduct of the personal assistant’s development.

Consider a typical question we might ask Siri:

What is the weather in San Francisco?

Computers are a long ways off from “understanding” this question literally, but with a few simple tricks, we can do a reasonable job understanding the intention of the question:

Swift Objective-C let question = "What is the weather in San Francisco?" let options : NSLinguistic Tagger Options = [ . Omit Whitespace , . Omit Punctuation , . Join Names ] let schemes = NSLinguistic Tagger . available Tag Schemes For Language ( "en" ) let tagger = NSLinguistic Tagger ( tag Schemes : schemes , options : Int ( options . raw Value )) tagger . string = question tagger . enumerate Tags In Range ( NSMake Range ( 0 , ( question as NSString ) . length ), scheme : NSLinguistic Tag Scheme Name Type Or Lexical Class , options : options ) { ( tag , token Range , _ , _ ) in let token = ( question as NSString ) . substring With Range ( token Range ) println ( " \( token ) : \( tag ) " ) } NSString * question = @"What is the weather in San Francisco?" ; NSLinguistic Tagger Options options = NSLinguistic Tagger Omit Whitespace | NSLinguistic Tagger Omit Punctuation | NSLinguistic Tagger Join Names ; NSLinguistic Tagger * tagger = [[ NSLinguistic Tagger alloc ] init With Tag Schemes : [ NSLinguistic Tagger available Tag Schemes For Language : @"en" ] options : options ]; tagger . string = question ; [ tagger enumerate Tags In Range : NSMake Range ( 0 , [ question length ]) scheme : NSLinguistic Tag Scheme Name Type Or Lexical Class options : options using Block :^ ( NSString * tag , NSRange token Range , NSRange sentence Range , BOOL * stop ) { NSString * token = [ question substring With Range : token Range ]; NSLog ( @"%@: %@" , token , tag ); }];

This code would print the following:

What: Pronoun is: Verb the: Determiner weather: Noun in: Preposition San Francisco: PlaceName

If we filter on nouns, verbs, and place name, we get [is, weather, San Francisco] .

Just based on this alone, or perhaps in conjunction with something like the Latent Semantic Mapping framework, we can conclude that a reasonable course of action would be to make an API request to determine the current weather conditions in San Francisco.

Tagging Schemes

NSLinguistic Tagger can be configured to tag different kinds of information by specifying any of the following tagging schemes:

NSLinguistic Tag Scheme Token Type : Classifies tokens according to their broad type: word, punctuation, whitespace, etc.

: Classifies tokens according to their broad type: word, punctuation, whitespace, etc. NSLinguistic Tag Scheme Lexical Class : Classifies tokens according to class: part of speech for words, type of punctuation or whitespace, etc.

: Classifies tokens according to class: part of speech for words, type of punctuation or whitespace, etc. NSLinguistic Tag Scheme Name Type : Classifies tokens as to whether they are part of named entities of various types or not.

: Classifies tokens as to whether they are part of named entities of various types or not. NSLinguistic Tag Scheme Name Type Or Lexical Class : Follows NSLinguistic Tag Scheme Name Type for names, and NSLinguistic Tag Scheme Lexical Class for all other tokens.

Here’s a list of the various token types associated with each scheme ( NSLinguistic Tag Scheme Name Type Or Lexical Class , as the name implies, is the union between NSLinguistic Tag Scheme Name Type & NSLinguistic Tag Scheme Lexical Class ):

NSLinguistic Tag Scheme Token Type NSLinguistic Tag Scheme Lexical Class NSLinguistic Tag Scheme Name Type NSLinguistic Tag Word

NSLinguistic Tag Punctuation

NSLinguistic Tag Whitespace

NSLinguistic Tag Other NSLinguistic Tag Noun

NSLinguistic Tag Verb

NSLinguistic Tag Adjective

NSLinguistic Tag Adverb

NSLinguistic Tag Pronoun

NSLinguistic Tag Determiner

NSLinguistic Tag Particle

NSLinguistic Tag Preposition

NSLinguistic Tag Number

NSLinguistic Tag Conjunction

NSLinguistic Tag Interjection

NSLinguistic Tag Classifier

NSLinguistic Tag Idiom

NSLinguistic Tag Other Word

NSLinguistic Tag Sentence Terminator

NSLinguistic Tag Open Quote

NSLinguistic Tag Close Quote

NSLinguistic Tag Open Parenthesis

NSLinguistic Tag Close Parenthesis

NSLinguistic Tag Word Joiner

NSLinguistic Tag Dash

NSLinguistic Tag Other Punctuation

NSLinguistic Tag Paragraph Break

NSLinguistic Tag Other Whitespace NSLinguistic Tag Personal Name

NSLinguistic Tag Place Name

NSLinguistic Tag Organization Name

So for basic tokenization, use NSLinguistic Tag Scheme Token Type , which will allow you to distinguish between words and whitespace or punctuation. For information like part-of-speech, or differentiation between different parts of speech, NSLinguistic Tag Scheme Lexical Class is your new bicycle.

Continuing with the tagging schemes:

NSLinguistic Tag Scheme Lemma : This tag scheme supplies a stem forms of the words, if known.

: This tag scheme supplies a stem forms of the words, if known. NSLinguistic Tag Scheme Language : Tags tokens according to their script. The tag values will be standard language abbreviations such as "en" , "fr" , "de" , etc., as used with the NSOrthography class. Note that the tagger generally attempts to determine the language of text at the level of an entire sentence or paragraph, rather than word by word.

: Tags tokens according to their script. The tag values will be standard language abbreviations such as , , , etc., as used with the class. Note that the tagger generally attempts to determine the language of text at the level of an entire sentence or paragraph, rather than word by word. NSLinguistic Tag Scheme Script : Tags tokens according to their script. The tag values will be standard script abbreviations such as "Latn" , "Cyrl" , "Jpan" , "Hans" , "Hant" , etc.

As demonstrated in the example above, first you initialize an NSLinguistic Tagger with an array of all of the different schemes that you wish to use, and then assign or enumerate each of the tags after specifying the tagger’s input string.

Tagging Options

In addition to the available tagging schemes, there are several options you can pass to NSLinguistic Tagger (combined with bitwise OR | ) to slightly change its behavior:

NSLinguistic Tagger Omit Words

NSLinguistic Tagger Omit Punctuation

NSLinguistic Tagger Omit Whitespace

NSLinguistic Tagger Omit Other

Each of these options omit the broad categories of tags described. For example, NSLinguistic Tag Scheme Lexical Class , which distinguishes between many different kinds of punctuation, all of those would be omitted with NSLinguistic Tagger Omit Punctuation . This is preferable to manually filtering these tag types in enumeration blocks or with predicates.

The last option is specific to NSLinguistic Tag Scheme Name Type :

NSLinguistic Tagger Join Names

By default, each token in a name is treated as separate instances. In many circumstances, it makes sense to treat names like “San Francisco” as a single token, rather than two separate tokens. Passing this token makes this so.

Finally, NSString provides convenience methods that handle the setup and configuration of NSLinguisticTagger on your behalf. For one-off tokenizing, you can save a lot of boilerplate:

var token Ranges : NSArray ? let tags = "Where in the world is Carmen San Diego?" . linguistic Tags In Range ( NSMake Range ( 0 , ( question as NSString ) . length ), scheme : NSLinguistic Tag Scheme Name Type Or Lexical Class , options : options , orthography : nil , token Ranges : & token Ranges ) // tags: ["Pronoun", "Preposition", "Determiner", "Noun", "Verb", "Personal Name"]

Natural language is woefully under-utilized in user interface design on mobile devices. When implemented effectively, a single utterance from the user can achieve the equivalent of a handful of touch interactions, in a fraction of the time.