This is a guest post by Wah Loon Keng, the author of spacy-nlp , a client that exposes spaCy’s NLP text parsing to Node.js (and other languages) via Socket.IO.

Natural Language Processing and other AI technologies promise to let us build applications that offer smarter, more context-aware user experiences. However, an application that’s almost smart is often very, very dumb. In this tutorial, I’ll show you how to set up a better brain for your applications — a Contextual Knowledge Base Graph.

Applications feel particularly stupid when they make mistakes that a human never would, but which a human can sort of understand. These mistakes reveal how crude the system’s actual logic is, and the illusion that you’re “talking” to something “intelligent” shatters.

To avoid these mistakes, we’d like our application to have a way to remember what the user has told it. We need to store these memories in a structured way — we want information we can act on, not just text we can search. In this post, I’ll show you how to start wiring up a solution to this problem, using free open-source technologies. Here’s a sneak preview of what we’re building:

Example of Contextual Graph Knowledge Base

I call this memory storage mechanism a Contextual Knowledge Base Graph (CKBG). The graph is contextual so that a query can automatically resolve into ground knowledge as a path in the graph.

For example, a query “call John” will invoke the function “call”, and a context “John”. “Call” knows that a phone number is needed, and “John” has one (otherwise it can ask and remember the answer). So, the subgraph (John)->(phone number) is selected and passed to “call” to execute the function.

Note All code and examples presented in this tutorial are early implementations and still a work in progress.

Before we can resolve these queries, we first have to build the CGKB. We want the knowledge base to be populated automatically. We don’t want the knowledge to be hard-coded by humans. Instead, the brain should learn by itself. So let’s start setting it up.

Before you start, make sure you have the latest versions of Python and Node installed. We then install spaCy and Socket.IO using the Python package manager pip. We also have to download spaCy’s statistical models (about 1GB of data).

Dependencies System: Python, Node, Neo4j

Python, Node, Neo4j Node modules: spacy-nlp, cgkb

spacy-nlp, cgkb pip modules: socketIO-client, spacy

socketIO-client, spacy User interface: AIVA

1. Install spaCy and Socket.IO pip install -U socketIO-client pip install -U spacy python -m spacy.en.download

Next, we need to install Neo4j, the graph database for our brain, with built in visualizer in the browser:

2. Install Neo4j for Mac or Linux if which neo4j > /dev/null ; then echo "Neo4j is already installed" else if [ $( uname ) == "Darwin" ] ; then brew install neo4j else wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add - echo 'deb http://debian.neo4j.org/repo stable/' | sudo tee /etc/apt/sources.list.d/neo4j.list sudo apt-get update sudo apt-get -y install neo4j fi fi

For the bot interface, install AIVA, my open-source framework for cross-plattform bot development. Fork the repo and clone your fork locally:

3. Install AIVA interface git clone https : // github . com / YOURUSERNAME / aiva . git & & cd aiva git checkout cgkb npm run setup

Start Neo4j and log in for the first time at http:∕∕localhost:7474 with default name and password neo4j , neo4j . It will ask you to change the password, use 0000 for this demo.

4. Start Neo4j neo4j start service neo4j start

The next step will depend on which platform you want to run your bot on. You can use AIVA on Slack, Telegram or Facebook. I prefer using Slack, as it’s generally easier. All you have to do is sign into your Slack account, create a bot user, get the Slack token and update config/default.json in your AIVA installation.

5. Configure your bot { "BOTNAME" : "NAME OF YOUR BOT" , "PORTS" : { "NEO4J" : 7476 , "SOCKETIO" : 6466 , "SLACK" : 8345 , "TELEGRAM" : 8443 , "FB" : 8545 } , "NGROK_AUTH" : null , "ADMINS" : [ "your_chat_account@email.com" ] , "ACTIVATE_IO_CLIENTS" : { "ruby" : false , "python3" : true } , "ADAPTERS" : { "SLACK" : { "ACTIVATE" : true , "HUBOT_SLACK_TOKEN" : "THE TOKEN YOU JUST GOT" } , "TELEGRAM" : { "ACTIVATE" : false , "TELEGRAM_TOKEN" : "get from bot father https://core.telegram.org/bots#3-how-do-i-create-a-bot" , "BOTNAME" : "your bot name from bot father" , "WEBHOOK_KEY" : "TELEGRAM_WEBHOOK" } , "FB" : { "ACTIVATE" : false , "FB_PAGE_ID" : "see aiva doc on adapters" , "FB_APP_ID" : "see aiva doc on adapters" , "FB_APP_SECRET" : "see aiva doc on adapters" , "FB_PAGE_TOKEN" : "see aiva doc on adapters" , "FB_AUTOHEAR" : true , "WEBHOOK_KEY" : "FB_WEBHOOK_BASE" , "FB_WEBHOOK_BASE" : "optional: set a persistent webhook url if you have one on ngrok, since FB takes 10 mins to update it" , "FB_ROUTE_URL" : "/fb" } } , "TEST" : { "HUBOT_SHELL_USER_ID" : "ID0000001" , "HUBOT_SHELL_USER_NAME" : "alice" } }

Finally, we can start the bot, and wait for it to be ready.

6. Start the bot npm start --debug

The stdout log should look something like this:

[Sat Oct 22 2016 17:36:12 GMT+0000 (UTC)] INFO Authenticated database successfully [Sat Oct 22 2016 17:36:14 GMT+0000 (UTC)] DEBUG Sequelize [Node: 6.7.0, CLI: 2.4.0, ORM: 3.24.3] ... [Sat Oct 22 2016 17:36:14 GMT+0000 (UTC)] INFO Starting poly-socketio server on port: 6466, expecting 4 IO clients ... [Sat Oct 22 2016 17:36:18 GMT+0000 (UTC)] INFO Logged in as aiva-dev of Global Hackers [Sat Oct 22 2016 17:36:18 GMT+0000 (UTC)] INFO Slack client now connected [Sat Oct 22 2016 17:36:18 GMT+0000 (UTC)] DEBUG Started global js socketIO client for SLACK at 6466 [Sat Oct 22 2016 17:36:19 GMT+0000 (UTC)] DEBUG global-client-js HAMbJ5QstqugAsABAAAC joined, 1 remains [Sat Oct 22 2016 17:36:26 GMT+0000 (UTC)] DEBUG cgkb-py N4tr885fIOxzEuGTAAAD joined, 0 remains [Sat Oct 22 2016 17:36:26 GMT+0000 (UTC)] INFO All 4 IO clients have joined

You’re done! Now, go on Slack and talk to your bot. It should parse your input into its brain. To see its brain, go to the Neo4j interface at http://localhost:7474 and do a query MATCH (u) RETURN u .

The demo essentially shows the syntactic dependency parse tree of your latest input. The NLP backend is powered by the node module spacy-nlp that connects to spaCy. It draws inspiration from displaCy, spaCy’s interactive dependency visualizer.

If you click on a graph node, you will see the parsed NLP information from spaCy.

In the terminal during debug mode, you can also see that information in JSON as returned from spaCy to the bot, which then inserts it into the brain.

spaCy parse as JSON (Excerpt) { "word" : "Book" , "lemma" : "book" , "NE" : "" , "POS_fine" : "VB" , "POS_coarse" : "VERB" , "arc" : "ROOT" , "modifiers" : [ { "word" : "me" , "lemma" : "me" , "NE" : "" , "POS_fine" : "PRP" , "POS_coarse" : "PRON" , "arc" : "dative" , "modifiers" : [ ] } , { "word" : "flight" , "lemma" : "flight" , "NE" : "" , "POS_fine" : "NN" , "POS_coarse" : "NOUN" , "arc" : "dobj" , "modifiers" : [ { "word" : "a" , "lemma" : "a" , "NE" : "" , "POS_fine" : "DT" , "POS_coarse" : "DET" , "arc" : "det" , "modifiers" : [ ] } , { "word" : "from" , "lemma" : "from" , "NE" : "" , "POS_fine" : "IN" , "POS_coarse" : "ADP" , "arc" : "prep" , "modifiers" : [ { "word" : "New York" , "lemma" : "New York" , "NE" : "GPE" , "POS_fine" : "NNP" , "POS_coarse" : "PROPN" , "arc" : "pobj" , "modifiers" : [ ] } ] } ] } ] }

How can this information be used in an application? Let’s say we are writing a flight-booking app. We can see that “Book” is a verb, i.e. an action to execute. “New York” is a named Entity of type "GPE" , i.e. a location. It also modifies “from”, so I know this is the origin. Likewise, “London” is the destination. Finally, we know the “flight” is for “Sunday”, which is tagged as a "DATE" .

Resources spacy-nlp: Expose spaCy to Nodejs via Socket.IO.

CGKB: The contextual graph brain with database and visualizer.

Slack bots: Slack API documentation.

spaCy: spaCy documentation.

If you have questions, or wish to collaborate (especially on the graph brain CGKB), reach out to me at my Twitter @kengzwl.