MongoDB is the primary database at VersionEye. Currently VersionEye is crawling more than 600K open source projects on a daily basis. Some of the crawlers are implemented in Java, others in Ruby. You can follow a library at VersionEye and as soon the next version comes out you get a email notification. Today I got this email from VersionEye.

As you can see the version information is missing for the Java libraries. The email template was not touched in the last couple days. Obviously the crawlers for Maven repositories are implemented in Java 🙂 and they get updated more frequently. The error must be somewhere in the Java crawlers.

The version object is an embedded document in the product object. Every time a crawler finds a new version it adds a new version object to the corresponding product object. The code for that looks like that.

BasicDBObject productMatch = new BasicDBObject(); productMatch.put(Product.LANGUAGE, language); productMatch.put(Product.PROD_KEY, prodKey); BasicDBObject versionObj = version.getDBObject(); versionObj.put(Version.VERSION, version.getVersion()); BasicDBObject versionsUpdate = new BasicDBObject(); versionsUpdate.put("$push", new BasicDBObject(Version.VERSIONS, versionObj));

So far so good. In the next lines the product object is updated with the current time.

DBObject verUpdate = getDBObjectByKey(language, prodKey); verUpdate.put(Product.UPDATED_AT, new Date()); getCollection().update(productMatch, verUpdate);

And of course there is a unit test case for this code and the test case is always green. On production sometimes the new version just disappears. Not always! Just sometimes. At first I thought I found a bug in MongoDB, but this only happened to the Java crawlers, never to the Ruby crawlers. So the root of all eval must be the implementation. I needed a whole day to figure it out!

On production MongoDB is running in a Replica Set on multiple hosts and 2 days ago I changed the read preference of the mongodb driver to “secondary”. That means that the read operations are distributed to ALL nodes in the Replica Set. And this is what happened.

The first code snippet always runs through and adds a new version to the product. But then the 2nd code snippet is reloading the product object from the db and executing an update.

DBObject verUpdate = getDBObjectByKey(language, prodKey); verUpdate.put(Product.UPDATED_AT, new Date()); getCollection().update(productMatch, verUpdate);

If the changes are not yet distributed in the whole Replica Set and the read operation goes to a node which doesn’t has the new version yet, a product object is loaded without the new version. On this object the “updated_at” field is updated and stored back to the database. But the “update” method on the java driver doesn’t update only the changed field, it updates the whole object. And so it comes that it stores the object without the new version.

There are different solutions to this. First of all I could change the read preference back to “primary” again. But there is a better solution. Actually there is a way to only update single properties in a document in MongoDB. That works like this.

DBObjectnewValues = getDBObjectByKey(language, prodKey); newValues.put(Product.UPDATED_AT, new Date()); BasicDBObject set = new BasicDBObject("$set", newValues); getCollection().update(productMatch, set);

The big difference is in line 3. That tells the java driver to only update the changed properties. On day headache for a one liner! I hope I can save somebody else 1 day headache with this blog post.