Update: Facebook has disabled this application

Your app is replicating core Facebook functionality.

Facebook Graph Search has given the Graph Database community a simpler way to explain what it is we do and why it matters. I wanted to drive the point home by building a proof of concept of how you could do this with Neo4j. However, I don’t have six months or much experience with NLP (natural language processing). What I do have is Cypher. Cypher is Neo4j’s graph language and it makes it easy to express what we are looking for in the graph. I needed a way to take “natural language” and create Cypher from it. This was going to be a problem.



Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

It’s an old programmer joke, but that is what came to mind. Some kind of fuzzy regular expressions. In the IPhone world, we usually hear people say “There’s an App for that”. In Ruby world, we go with “there’s a Gem for that”… so I asked google for some help and came upon Semr.

Semr is the gateway drug framework to supporting natural language processing in your application. It’s goal is to follow the 80/20 rule where 80% of what you want to express in a DSL is possible in familiar way to how developers normally solve solutions. (Note: There are other more flexible solutions but also come with a higher learing curve, i.e. like treetop)

Awesome, a ray of light to solve my problem… but the Gem is 4 years old. I could not get it to install. Bummer… Wait what was that about Treetop?

Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge parsing expression grammars, it helps you analyze syntax with revolutionary ease.

Score! Now I had no idea how to write a proper language grammar, but that’s never stopped anyone before. Someone who has more than a couple hours of experience with Treetop is going to laugh at this but I’ll show you part of what I did:

rule friends "friends" <Friends> end rule likes "who like" <Likes> end rule likeand likes space thing space "and" space thing <LikeAnd> end rule thing [a-zA-Z0-9]+ <Thing> end

I am creating some rules for things, and the likes relationship, and also the idea of “likes this and that”.

The “natural language” is run by these rules and a syntax tree is generated with the matching rules. These are then turned into hashes representing pieces of cypher. Looking at the code above and below you can see how “friends who like Neo4j” gets parsed into Friends, Likes, Thing.

class Friends < Treetop::Runtime::SyntaxNode def to_cypher return {:start => "me = node({me})", :match => "me -[:friends]-> people", :return => "people", :params => {"me" => nil }} end end class Likes < Treetop::Runtime::SyntaxNode def to_cypher return {:match => "people -[:likes]-> thing"} end end class Thing < Treetop::Runtime::SyntaxNode def to_cypher return {:start => "thing = node:things({thing})", :params => {"thing" => "name: " + self.text_value } } end end

Then these hashes are combined and turned into a proper Cypher string:

class Expression < Treetop::Runtime::SyntaxNode def to_cypher cypher_hash = self.elements[0].to_cypher cypher_string = "" cypher_string << "START " + cypher_hash[:start].uniq.join(", ") cypher_string << " MATCH " + cypher_hash[:match].uniq.join(", ") unless cypher_hash[:match].empty? cypher_string << " RETURN DISTINCT " + cypher_hash[:return].uniq.join(", ") params = cypher_hash[:params].empty? ? {} : cypher_hash[:params].uniq.inject {|a,h| a.merge(h)} return [cypher_string, params].compact end end

Finally I built a Sinatra web application that imports your data from Facebook and a search page so you can try this out for yourself. As always, the code is available on Github, and hosted on Heroku.

While reproducing a “kinda” Facebook Graph Search is interesting, what would be more interesting is seeing other people use this idea on their own data. If you would like to know more about this proof of concept, contact me or come to the Neo4j Meetups in Virginia (Feb 26th) or in Boston (Feb 28th) or in Chicago (TBD) and somewhere near you.