



Welcome to the second installment of the Graph Kit for Ruby series. In the first post I described the plan for the project — to showcase the ease of use and business value of graph databases in the context of a Ruby project. Today we’re digging into the implementation of a Neo4j-backed product recommendation engine. This recommendation engine will sit atop an online store built with the Rails-based e-commerce project Spree.

Brief note on prerequisites and recommended experience level This post assumes that the reader has a basic understanding of the Ruby ecosystem. You’ll need a newer version of Ruby (2.0+ preferred) and the `bundler` utility. You might also need to be able to install some packages on your system like the PostgreSQL and Neo4j database engines. I used Ubuntu to prototype this but you should be able to get by fine with any recent version of Linux or OSX. The finished project is published on Graph Story’s GitHub account as the Graph Kit for Ruby, so feel free to dig in for specific implementation details. If at any point you get stuck feel free to reach out to the Graph Story team for help. Here we go!

Starting a new Spree Project

You’ll want to follow the recommendations in the official Spree Getting Started Guide to get your workspace set up. Note that you’ll start with the usual rails new installer before getting into Spree-specific setup. Here are the basic shell commands to get your Rails project started with Spree loaded into its Gemfile :

# get some gems gem install rails - v 4 . 1 . 6 gem install bundler gem install spree_cmd # make a new rails project rails _4 . 1 . 6 _ new graph - kit - ruby cd graph - kit - ruby # install spree into the gemfile and run its generators spree install -- auto - accept

Additional Gems

Look through the completed project’s Gemfile on GitHub and you’ll see several other important gems:

neo4j : This Object-Graph Mapper (OGM) gem aims to provide full-featured Neo4j access with an ActiveRecord feel. Rails devs with minimal Neo4j experience should appreciate the familiarity

: This Object-Graph Mapper (OGM) gem aims to provide full-featured Neo4j access with an ActiveRecord feel. Rails devs with minimal Neo4j experience should appreciate the familiarity pg : You’ll need an RDBMS for Spree, and I prefer PostgreSQL

: You’ll need an RDBMS for Spree, and I prefer PostgreSQL dotenv : I’m going to deploy this to Engine Yard, and dotenv came in very handy. More on that in post 3

: I’m going to deploy this to Engine Yard, and came in very handy. More on that in post 3 jazz_hands — This one provides pry and several other convenience tools for working at the Ruby console. I find Pry in particular to be very helpful when learning a new library such as neo4j

Building a Recommendation Engine with Neo4j

My idea here was to model Users, Products, and Purchases in a graph so that we could easily identify purchasing patterns to use in our recommendations. Spree already has a Spree::User and a Spree::Product model. Purchases are modeled in the RDBMS as Spree::LineItem rows that associate a product with a specific order. I’ll create a Recs module inside my project and give it User and Product models that are linked together by purchase histories. The Product model is where most of the graph action happens, so let’s focus there.

Designing a “product” node type using the neo4j OGM

Spree gives us a products table in Postgres out of the box, so what we want to do is set up a Product node in Neo4j for each product row in Postgres. With a bit of work we can ensure that each new Postgres product automatically creates a matching graphed product node. I created a :graphed method on the Spree::Product model that finds or creates a matching Product in Neo4j on demand by calling into the self.from_spree_product(spree_product) method in the screenshot above.

Important things to notice in the Recs::Product model

Each node has a slug property – this is the unique identifier Spree uses in product URLs and in its relational database to distinguish one product from another. Adding that to the companion node ensures I can link products from one database to the other and back.

property – this is the unique identifier Spree uses in product URLs and in its relational database to distinguish one product from another. Adding that to the companion node ensures I can link products from one database to the other and back. The model class gets most of its functionality by including Neo4j::ActiveNode . This gives us ActiveRecord-like semantics for finding, creating, updating, and removing nodes.

. This gives us ActiveRecord-like semantics for finding, creating, updating, and removing nodes. It has_many purchases as connections to User nodes. This is where the edges in our graph come from and it’s how we’ll make meaningful queries against our graph.

Breaking down a Cypher query generated with the neo4j gem

The most interesting thing about this product model is of course the Cypher query (Cypher is Neo4j’s declarative query language) that allows us to surface related purchase data. Let’s break it down line by line:

This query is executing in the context of an already selected Product node. We’ll refer to this starting point in the query itself as :product .

query_as ( :product ) .

Identify all products which have been bought by users who have also bought the :product this query is built on.

match ( "product<--(user:`Recs::User`)-->(other_product:`Recs::Product`)" ) .

Discard products in the result set that are equivalent to the initial :product .

match ( "product<--(user:`Recs::User`)-->(other_product:`Recs::Product`)" ) .

Limit our results to a few products so that Neo4j and Spree can spit out results faster.

limit ( limit ) .

Return an array of unique products that match the other_product node in our match statement. This means any products bought by people who bought :product should be a valid result.

pluck ( 'DISTINCT other_product' )

If you squint at that query you can see a sort of (Product)<–(User)–>(Other Product) relationship going on in the match statement. “Queries that look like whiteboarded graphs” seems to be a design goal of the Cypher query language used in Neo4j. As a new user I can say it is pretty easy to get started with.

Automatically logging new purchase events as connections in our graph

Once we implement our Recs::User and Recs::Product model with their ‘purchase’ connection type, all we need to do is automate the logging of purchase events from Spree and PostgreSQL over to our Neo4j database. Here’s how to do that:

def log_to_graph return unless user . try ( :graphed ) user . graphed . purchases << product . graphed end

Let’s make some fake yet interesting purchase history data

In order to demonstrate Neo4j’s ability to easily unearth interesting connections in our data set, I decided to create some pretend customers with very consistent purchasing habits. For instance, a Mr. Green might only buy green products, while a Ms. Pillow might buy any pillow in the store regardless of its color. You can see the methodology used to generate this sample data in the graph-kit-ruby repository on GitHub.

Inspecting the data in the Neo4j browser with the Cypher query language

One of my a-ha moments when getting into Neo4j was discovering the built-in web server and its visualization tools. After you’ve stuffed some nodes and relationships into your dev database you can visualize them by poking around at http://localhost:7474. You can click on the “purchases” relationship to see a visualization of the entire product purchase history graph. I wanted to dig in a bit deeper though so I got my Recs::Product model to give me some Cypher help. You can learn more about Cypher on Neo4j’s site.

Using only the neo4j gem’s built-in methods and the Cypher syntax we covered above I’ve isolated a single Product node and gotten a good lead on how to look it up by hand using my own Cypher query. Note the :to_cypher method in the screenshot above which generates a working query from your Ruby code just like :to_sql in ActiveRecord. Unfortunately for me pasting that directly into the Neo4j browser didn’t quite work, but it got me close enough. I tweaked the WHERE clause to look for product.slug = 'red-shirt' rather than the parameterized product_id query :to_cypher gave me and then I added RETURN product, user, other_product to the end. Once I’d fixed up the Cypher I was able to get a neat visualization of the red shirt, the one user in my test data who’d bought it, and all the other things that user purchased. Shirts, all shirts!

Integrating our product recommendations with the Spree storefront

Now that we’ve generated our sample data and figured out how to query Neo4j for simple product recommendations, let’s add them to our storefront and call it a day. I wired up the product recommendations directly into the Spree::Product model as :users_also_bought . That delegates to the :users_also_bought method from the relevant Recs::Product node and returns the first three results. Armed with that easy lookup I dropped a new section into the product detail view with a _users_also_bought.html.erb partial template:

< % if ( products = product . users_also_bought ). any? %> <div class='users-also-bought'> < h3 > Users also bought :< / h3 > < %=render partial: 'spree/shared/products', locals: { products: products } %> </div> <% end %>

My favorite thing about this partial is that it managed to leverage a built-in Spree products partial, and all I have to do is pass it a local variable named products to which I’d assigned the results of product.users_also_bought . There’s really nothing going on here other than looking up the data and passing it along to the built-in.

Final post: Deploying our Ruby graph kit to Engine Yard and Graph Story

For the third and final post in this series we’ll cover the sysadmin work required to deploy your working graph-enhanced Spree site to production. We’ve chosen to deploy to Engine Yard Cloud, so most of the post will focus on configuration specific to their environments. You’ll also see how to switch from a local Neo4j server in development to a production-ready Graph Story server by layering in the appropriate connection strings.