At Big Datext, we are working on a daily basis on NLP and Data Science projects, always trying to use state-of-art tools and models. We are thrilled to announce our first open source project. A lightweight python wrapper around SyntaxNet.

Why Open Source?

As many others, we are using community work everyday, from open source project to StackOverflow answers.

This being said, we are a company and needed to make a choice. Should we hold our entire stack private, considering it as business value or can we drop a bit of our intellectual property for the better good?

As you might have figured out, we decided to take a shot. Among all the possible reasons[1], the most important is the urge to share. Share in a way so that everyone can make the most of our work.

We hope that our work will give a little push to the community, helping it to advance in our domain as the community helped us to advance in our business.

SyntaxNet Parser

Last year, Google’s research team released SyntaxNet[2], a state-of-art parser along with a English pre-trained model.

A parser is a key component to build systems able to understand natural language. It allows analysing the grammatical structure of sentences, a task that human can do easily but where a lot of ambiguity remain for a computer.

The SyntaxNet Parser provides the following outputs for each word:

The Part of Speech tag: the word’s function in the sentence such as the gender, the grammatical category, the person …

The syntactic relationship between words, the way words are linked to each other in a sentence.

Parsing of a simple sentence, source[3]

Shortly after SyntaxNet release on github[3], Google made available many pre-trained models on different languages[4]. Making things handy for us, working mainly in French.

Today, SyntaxNet is a part of our internal stack, improving our process accuracy.

Getting Started

Our wrapper intends to be as lightweight as possible, without transformation of SyntaxNet’s output.

Simple example of how to use the wrapper

As shown through the example, using it is quite simple. The wrapper handles all pre-trained models on different languages and can give only the morphological or pos-tagging parts.