With new offerings from Google, Microsoft and others, there are now a range of excellent cloud APIs for syntactic dependencies. A key part of these services is the interactive demo, where you enter a sentence and see the resulting annotation. We’re pleased to announce the release of displaCy.js, a modern and service-independent visualisation library. We hope this makes it easy to compare different services, and explore your own in-house models.

Update (Feburary 2018) As of spaCy v2.0, the displaCy visualizer is integrated into the core library. It supports serving the visualizations in the browser, generating the raw markup or outputting the results in a Jupyter notebook. For more details, see the visualizers documentation.

The History of displaCy We launched displaCy as a visualiser for our NLP library spaCy in 2015 and open-sourced the code in August 2016. The first version relied on an old CSS hack. The new version uses SVG to produce flexible and easily exportable output.

Here’s an example of a sentence rendered by the new SVG-based displaCy:

A note on compatibility displaCy is written in ECMAScript 6. For full, cross-browser compatibility, make sure to use a compiler like Babel. For more info, see this compatibility table.

Simply include displacy.js and initialize a new instance specifying the API and settings. The parse(text, model, settings) method renders a parse generated by spaCy as an SVG in the container. By default, it expects spaCy’s services, which you can download and run for free. If you’re using Google’s NLP API instead, set format to 'google' .

const api = 'http://localhost:8000' const displacy = new displaCy ( api , { container : '#displacy' , format : 'spacy' , distance : 300 , offsetX : 100 , } ) displacy . parse ( 'This is a sentence.' , 'en' , { collapsePunct : false , collapsePhrase : false , color : '#ffffff' , bg : '#000000' , } )

For a full list of available settings, see the Readme. Alternatively, you can also use render(parse, settings) to manually render a JSON-formatted set of arcs and words. displaCy logs the JSON representation of each parse to the console for quick copy-pasting:

A dependency visualisation consists of three main components:

words and their corresponding part-of-speech tags displayed horizontally in order arcs of different lengths connecting two words with corresponding labels showing their relation type an arrow head at the start or end of each arc indicating its direction

About SVG The Scalable Vector Graphics format has been around since the early 2000s. Unlike other image formats, SVG uses XML markup that’s easy to manipulate using CSS or JavaScript. SVG even offers powerful color filters and dynamic cropping and with improving browser support, has replaced icon fonts on many sites.

All three components can be implemented using the SVG elements and , with to separate spans of text and to wrap the arc label along the rounded arc path. Let’s take a look at the first word, “Robots”, and the arrow connecting it to “are”. This is a simplified example of the markup displaCy generates:

Example SVG markup (excerpt) < svg xmlns = " http://www.w3.org/2000/svg " xmlns: xlink = " http://www.w3.org/1999/xlink " > < text y = " 440 " text-anchor = " middle " > < tspan x = " 150 " fill = " currentColor " > Robots </ tspan > < tspan x = " 150 " dy = " 2em " fill = " currentColor " > NNS </ tspan > </ text > < path id = " arrow-0 " d = " M150,400 C150,0 950,0 950,400 " stroke-width = " 2px " stroke = " currentColor " fill = " none " > </ path > < text dy = " 1em " > < textPath xmlns: xlink = " http://www.w3.org/1999/xlink " xlink: href = " #arrow-0 " startOffset = " 50% " text-anchor = " middle " fill = " currentColor " > nsubj </ textPath > </ text > < path d = " M150,402 L144,392 156,392 " fill = " currentColor " > </ path > </ svg >

The above markup was generated from JSON data that looked like this:

Example JSON from API (excerpt) { "arcs" : [ { "dir" : "left" , "end" : 4 , "label" : "nsubj" , "start" : 0 } ] , "words" : [ { "tag" : "NNS" , "text" : "Robots" } ] }

To translate the JSON-format into SVG markup, we need two main functions: one to draw the words, and one to draw the arcs.

Rendering the words is pretty straightforward because they are independent of the overall sentence. Each word needs two coordinates: x , the distance from the left, and y , the distance from the top. Starting at a fixed offset from the left, the first word will be placed at offsetX , the second word at offsetX + distance , the third word at offsetX + 2 * distance and so on. This can be broken down into a simple formula, offsetX + i * distance . Let’s not focus too much on the y coordinates for now, as they’re pretty much identical for all components — for the words, I’m merely adding a little spacing so they’re not too close to the arrows.

const offsetX = 150 ; const distance = 300 ; const markup = words . map ( ( { text , tag } , i ) => ` <text y=" ${ offsetY + wordSpacing } " text-anchor="middle"> <tspan x=" ${ offsetX + i * distance } "> ${ text } </tspan> <tspan x=" ${ offsetX + i * distance } " dy="2em"> ${ tag } </tspan> </text> ` . trim ( ) ) . join ( '' ) ;

Each arc comes with the index of its start and end point, making it trivial to calculate its length: end - start . This makes the start point of an arc offsetX + start * distance and the end point offsetX + (end - start) * distance . Now if we add these numbers to our path definition, we get a nice straight line connecting both points:

< path stroke = " currentColor " d = " M150,400 950,400 " />

The curves are a little trickier. For each curve, we need to add four additional values to the path definition: the x and y coordinates of the left and right cubic bézier control points. To show how this looks, I’ve forked this great demo from SitePoint. You can move the control points around and see how it affects the markup:

The curve’s height needs to adapt to the arc’s length. An arrow spanning over three words needs to be higher than an arrow spanning over two – otherwise they’ll overlap.

Depending on the grammatical strucutre of the sentence, we usually end up with a lot of arcs with lengths of 1 and 2 , connecting words to their next and second next neighbours, then one or two arrows with lengths of 3 or 4 , and maybe a large one with a length of 10 . Long dependencies like that are especially common with relative clauses, questions and punctuation, and in languages like German, where verbs and verb prefixes are often placed at the end of the clause.

Bugfix note This issue was also the cause of one of the main bugs in the old displaCy. I used a hack to decrease the overall arc height by a certain percentage if the sentence had particularly long arcs. However, this would sometimes cause the smallest arcs to become invisible.

If we use only the length of an arc to calculate its curve, we quickly run into a problem when visualising complex sentences: the largest arrows become huge and produce too much whitespace, rendering the visualisation pretty much unusable.

The longest arc is huge compared to the others and produces too much whitespace.

The largest arrow’s height here is relative to its length of 21 , even though a height relative to a length of 8 would have sufficed to still make it higher than the second largest one. We can solve this by generating a list of all occuring lengths in order. When rendering the arc, we can now use the index of each length ( + 1 , to start with level 1).

const levels = [ ... new Set ( arcs . map ( ( { end , start } ) => end - start ) . sort ( ( a , b ) => a - b ) ) ] ; const arc = { dir : "right" , end : 28 , label : "punct" , start : 7 } ; const level = levels . indexOf ( arc . end - arc . start ) + 1 ) ;

The same sentence with a much better result – the largest arc is still the biggest.

We can now generate arrows and their curves relative to the overall levels:

const highestLevel = levels . indexOf ( levels . slice ( - 1 ) [ 0 ] ) + 1 const offsetX = 150 const distance = 300 const startX = offsetX + start * distance const startY = ( distance / 2 ) * highestLevel const endpoint = offsetX + ( end - start ) * distance const curve = startY - ( ( end - start ) * distance ) / 2 const d = ` M ${ startX } , ${ startY } C ${ startX } , ${ curve } ${ endpoint } , ${ curve } ${ endpoint } , ${ startY } ` const path = ` <path d=" ${ d } " stroke-width="2px" fill="none" stroke="currentColor" ></path> `

The arrow head is simply a path forming a triangle that is placed either at the start or the end of the arc. To wrap the label along the middle of the arc path, we can take advantage of the element and link it to the id of the arc:

< path id = " arrow-0 " d = " ... " > </ path > < textPath xlink: href = " #arrow-0 " startOffset = " 50% " text-anchor = " middle " > Label </ textPath >

To allow custom styling, all elements contained in the SVG come with tags and data attributes. By default, the currentColor of the element is used for colouring, meaning you only need to change the color property in CSS.

For example, arrows have the class .displacy-arrow as well as a data-label and data-dir attribute. Using a combination of those selectors and some basic CSS logic, you can create pretty powerful templates to style the elements based on their role and function in the parse.

.displacy-tag[data-tag^='NN'] { color : green ; } .displacy-tag[data-tag^='VB']:not([data-tag='VB']) { display : none ; }

For more CSS examples, see the Readme.

Since SVG graphics consists of basic XML, we can use a templating engine like Jade (Pug) to dynamically generate the markup. For this blog, I wrote a simple mixin that generates a static inline SVG for any given JSON representation of a parse. It’s even more compact than displacy.js (less than 50 lines!) and is available here. It works with Jade-based static site generators like Harp, or Node applications using Express, which natively supports Jade templates.

To use the mixin, include it at the top of your file and call +displacy() with the full parse object as its argument:

include _displacy +displacy ( { arcs : [ ... ] , words : [ ... ] } )

To add custom class names to individual arcs, you can add a style: "classname" to the respective arc object. We’ve used this feature in this post to illustrate a correct dependency vs. an incorrect dependency in one graphic.

We’re planning support for more annotation formats like CoreNLP. In the meantime, you can can add your own custom converter. We’ve also launched a modern and lightweight Named Entity Visualiser — stay tuned for another in-depth blog post!