Everybody loves a good certification. Twice so when it’s for free and quadruply so if it’s in a cool new technology like Neo4j. In case you’re unfamiliar with Neo4j, it’s a graph database – a novel database concept that belongs to the NoSQL class of databases, i.e. it does not follow a relational model. Rather, it allows for the storage of, and computation on, graphs.

From a purely mathematical perspective, a graph $G(V,E)$ is formally defined as an ordered pair of vertices $V$ (called nodes in Neo4j) and edges $E$ (known as relationships in Neo4j). In other words, the first class citizens of a graph are ‘things’ and ‘connections between things’. No doubt you can already think of a lot of problems that can be conceptualised as graph problems. Indeed, for a surprising number of things that don’t sound very graph-y at all, it is possible to make use of graph databases. Not that you should always do so (no single technology is a panacea to every problem and I would look very suspiciously at someone who would implement time series as a graph database), but that does not mean it’s not possible in most cases.

Which leads me to the appeal of Neo4j. In general, you had two approaches to graph operations until graph databases entered the scene. One was to write your own graph object model and have it persist in memory. That’s not bad, but a database it sure ain’t. Meanwhile, an alternative is to decompose the graph into a table of vertices and its properties and another table of connections between vertices (an adjacency matrix) and then store it in a regular RDBMS or, somewhat more efficiently, in a NoSQL key-value store. That’s a little better, but it still requires considerable reinvention of the wheel.

The strength of graph databases is that they facilitate more complex operations, way beyond storage and retrieval of graphs, such as searching for patterns, properties and paths. One done-to-death example would be the famous problem known as Six Degrees of Kevin Bacon, a pop culture version of Erdös numbers: for an actor $A$ and a Kevin Bacon $K$ within a graph $G_{Actors}$ with $A, K \in G_{Actors}$, what is the shortest path (and is it below six jumps?) to get from $A$ to $K$? Graph databases turn this into a simple query. Neo4j is one of the first industrial grade graph DBs, with an enterprise grade product that you can safely deploy in a production system without worrying too much about it. Written in Java, it’s stable, fast and has enough API wrappers to have some left over for the presents next Christmas. Alongside the more traditional APIs, it’s got a very friendly and very visual web-based interface that immediately plots your query results and a somewhat weird but ultimately not very counter-intuitive query language known as Cypher. As such, if graph problems are the kind of problem you deal with on a regular basis, taking Neo4j for a spin might be a very good idea.

Which in turn leads me to the Neo4j certification. For the unbeatable price of $0.00, you can now sit for the esteemed title of Neo4j Certified Professional – that is, if you pass the 80-question, 60-minute time-capped test with a score of 80% or above. Now, let not the fact that it’s offered for free deter you – the test is pretty ferocious. It takes a fairly in-depth knowledge of Neo4j to pass (I’ve been around Neo4j ever since it has been around, and while I’ve never tried it and passed at first try recently, it has been surprisingly hard even for me!), the time cap means that even if you do decide to refer to your notes (I am not sure if that’s not cheating – I personally did not, as it was just so time-intensive), you won’t be able to pass merely from notes. Worse, there are no test exams and preparation material is scarce outside (rather pricey!) trainings. As such, I’ve written up the ten things I wish I had known before embarking upon the exam. While I did pass at the first try, it was a lot harder than I expected and I would definitely have prepared for it differently, had I known what it would be like! Fortunately, you can attempt it as often as you would like for no cost, and as such it’s by no means an impossible task, but you’re in for a ride if you wish to pass with a good score. Fasten your seat belt, flip up the tray table and put your seat in a fully upright position – it’s time to get Neo4j’d!

1. This is not a user test… it’s a user and DBA test.

I haven’t heard of a single Neo4j shop that had a dedicated Neo4j DBA to support graph operations. Which is ok – compared to the relatively arcane art of (enterprise) RDBMS DBAs, Neo4j is a breeze to configure. At the same time, the model seems to expect users to know what they’re doing themselves and be confident with some close-to-the-metal database tweaking. Good.

The downside is that about a quarter or so of the questions have to do with the configuration of Neo4j, and they do get into the nitty-gritty. You’re expected, for instance, to know fairly detailed minutiae of Enterprise edition High Availability server settings.

2. Pay attention to Cypher queries. The devil’s in the details.

If you’ve done as many multiple choice tests as I have, you know you’ve learned one thing for sure: all of them follow the same pattern. Two answers are complete bunk and anyone who’s done their reading can spot that. The remaining two are deceptively similar, however, and both sound ‘correct enough’. In the Neo4j test, this is mainly in the realm of the Cypher queries. A number of questions involve a ‘problem’ being described and four possible Cypher queries. The candidate must then spot which of these, or which several of these, answer the problem description. Often the correct answer may be distinguished from the incorrect one by as little as a correctly placed colon or a bracket closed in the right order. When in doubt, have a very sharp look at the Cypher syntax.

Oh, incidentally? The test makes relatively liberal use of the ‘both directions match’ (a)-[:RELATION]-(b) query pattern. This catches (a)-[:RELATION]->(b) as well as (b)-[:RELATION]->(a) . The lack of the little arrow is easy to overlook and can lead you down the wrong path…

3. Develop query equivalence to second nature.

Python was built so that there would be one, and exactly one, right way to do everything. Sort of. Cypher is the opposite – there are dozens of ways to express certain relations, largely owing to the equivalence of relationships. As such, be aware of two equivalences. One is the equivalence of inline parameters and WHERE parameters:

MATCH (a:Person {name: "John Smith"})-[:REL]->(b) RETURN a;

MATCH (a:Person)-[:REL]->(b) WHERE a.name = "John Smith" RETURN a;

Also, the following partials are equivalent, but not always:

(a)-[:FIRST_REL]->(b)<-[:SECOND_REL]-(c)

(a)-[:FIRST_REL]->(b) (c)-[:SECOND_REL]->(b)

When you see a Cypher statement, you should be able to see all of its forms. Recap question: when are the statements in the second pair NOT equivalent?

4. The test is designed on the basis of the Enterprise edition.

Neo4j comes in two ‘flavours’ – Community and Enterprise. The latter has a lot of cool features, such as an error-resilient, distributed ‘High Availability’ mode. The certification’s premise is that you are familiar – and familiar to a fairly high degree, actually! – with many of the Enterprise-only features of Neo4j. As such, unless you’re fortunate enough to be an enterprise user, it might repay itself to download the 30-day evaluation version of Neo4j Enterprise.

5. The test is generally well-written.

In other words, most things are fairly clear. By fairly clear, I mean that there is little ambiguity and it uses the same language as the reference (although comparing test questions to phrases that stuck in my head which I ended up checking after the test, just enough words are changed to deter would-be cheaters from Ctrl+F-ing through the manual! There are no trick questions – so try to understand the questions in their most ‘mundane’, ‘trivial’ way. Yes, sometimes it is that simple!

6. TRUNCATE BRAINSPACE sql_clauses;

A lot of traditional SQL clauses (yes, TRUNCATE is one example – so is JOIN and its multifarious siblings, which describe a concept that simply does not exist in Neo4j) come up as red herrings in Cypher application questions. Try to force your brain to make a switch from SQL to Cypher – and don’t fall for the trap of instinctively thinking of the clauses in the SQL solution! Forget SQL. And most of all, forget its logic of selection – MATCH ing is something rather different than SELECT ing in SQL.

7. Have a 30,000ft overview of the subject

In particular, have an overview of what your options are to get particular things done. How can you access Neo4j? You might have spent 99% of your time on the web interface and/or interacting using the SDK, but there is actually a shell. How can you backup from Neo4j, and what does backup do? What are your options to monitor Neo4j? Once again, most users are more likely to think of one solution, perhaps two, when there are several more. The difficult thing about this test is that it requires you to be exhaustive – both in breadth and in depth.

8. Algorithms, statistics and aggregation

As far as I’m aware, everyone gets slightly different questions, but my test did not include anything about the graph algorithms inherent in Neo4j (good news for philistines people who want to get stuff done). It did, however, include quite a bit of detail about aggregation functions. You make of that what you will.

9. Practice on Northwind but know the Movie DB like the back of your hand.

Out of the box, if you install Neo4j Community on your computer, you have two sample databases that the Browser offers to load into your instance – Movie and Northwind . The latter should be highly familiar to you if you have a past in relational databases. Meanwhile, the former is a Neo4j favourite, not the least for the Kevin Bacon angle. If you did the self-paced Getting Started training (as you should have!), you’ll have used the Movie DB enough to get a good grip of it. Most of the questions on the text pertain or relate in some way to that graph, so a degree of familiarity can help you spot errors faster. At the same time, Northwind is both a better and bigger database, more fun to use and allows for more complex queries. Northwind should therefore be your educational tool, but you should know Movie rather well for that little plus of familiar feeling that can make the difference between passing and failing. Oh, by the way – while Getting Started is a great course, you will not stand a snowball’s chance in hell without the Production course. This is so even if you’ve done your fill of deployments and integrations – quite simply put, the breadth of the test is statistically very likely to be beyond your own experiences, even if you’ve done e.g. High Availability deployments yourself. In the real world, we specialise – for the test, however, you must be a generalist.

10. Refcards are your friends.

Start with the one for Cypher. Then build your own for High Availability. Laminate them and carry them around, if need be – or take the few functions or clauses that are your weak spots, put them on post-its and plaster them on your wall. Whatever helps – unless you’re writing Cypher code 24/7 (in which case, what are you doing here?), which I doubt happens a lot, there’s quite simply no substitute for seeing correct code and being able to get a feeling for good versus bad code. The test is incredibly fast paced – 80 questions over 60 minutes gives you 45 seconds for a turnkey execution. At least 15-20 of that is reading the question, if not more (it definitely was more for me – as noted, most questions repay a thorough reading!). Realistically, if you want to make that and have time to think about the more complex questions, you’ve got to be able to bang out simple Cypher questions (I’d say there were about 8-10 of them altogether, worth an average number of points, though I (and I do regret this now) didn’t count them.

While the Neo4j certification exam is far from easy, it is doable (hey, if I can do it, so can you!). As graph databases are becoming increasingly important due to the recognition that they have the potential to accelerate certain calculations on graph data, coupled with the understanding that a lot of natural processes are in reality closer to relationship-driven interactions than the static picture that traditional RDBMS logic seeks to convey, knowing Neo4j is a definite asset for you and your team. Regardless of your intent to get certified and/or view on certifications in general (mine, too, is in general more on the less complimentary side), what you learn can be an indispensable asset in research and operations as well. Of course, I’m happy to answer any questions about Neo4j and the certification exam, insofar as my subjective views can make a valid contribution to the matter.

Update 15.02.2016: Neo4j community caretaker Michael Hunger has been so kind as to leave a comment on this article, pointing out that the scant feedback is intentional – it prevents re-takers from simply banging in the correct answers from the feedback e-mail. That makes perfect sense – and is not something I thought of. Thanks, Michael. He is also encouraging recent test takers to propose questions for the test – to me, it’s an unprecedented amazingness for a certificate provider to actually ask the community what they believe to be to be the cornerstone and benchmarks of knowledge in a particular field. So do take him up on that offer – his e-mail is in his comment below.