JSON is a pretty versatile data format. It's human readable, doesn't add a lot of unnecessary syntax, has very few rules and a simple format. JSON is pretty dominant as the data transfer protocol for web services, excellent for logging / debugging, and even found its way into many config file formats. Since we already have this great textual representation of our data it's not a far leap to assume you can easily utilize JSON for making a md5 checkusm of an object or possibly easily diffing two objects of the same type. Making this assumption at least in Java may take you down quite the rabbit hole. Expanding from our practical Jackson ObjectMapper configuration we will explore how to make it deterministic-ish.

The Problem

JSON serialization is not required to be deterministic. If you throw together a few quick unit tests you might not run into any issues, however once you start reading / writing from a data store or using collections with non deterministic iteration order you will begin to have a bad day.

Sort the keys!

Your unit tests are passing and everything seems to be working on the surface so your checksum / diffing code is deployed to prod. Next thing you know bugs are coming in, equivalent objects are showing that they are no longer the same based on the checksum / diff. After some debugging you notice that sometimes the fields are serialized in different orders for the same object types. Google to the rescue! In about 30 seconds you find the Jackson feature ObjectMapper.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true) , problem solved! A few days later the same bug is reported again! You know it was fixed, it can't possibly be the code. After some head scratching and more Google searches you discover that the previous feature does not apply to Map keys, enter ObjectMapper.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true) .

Sort the values!

Several days of smooth sailing go by when suddenly your inbox alerts "JIRA BUG - Things randomly not working - REOPENED". Great. All the unit tests are still passing and you verified the recent fixes are still there. After finally reproducing the issue you now have two sublime tabs open with fairly large JSON objects in them. At first glance they look exactly the same but the code says they are not. You flip from tab to tab back and forth at various locations in the files (or if you are smart just run a quick command line diff). THERE IT IS! something is different! Of the 20mb JSON payload the difference comes down to "items": [1, 2, 3], and "items": [1,3,2], . The POJO you serialized is using a Set<Integer> and doesn't always serialize in the same order. Since you already sorted the keys, you decide to just sort the values of all collections! You find a way to hack Jackson to sort all collections which of course was not as easy as it sounded (More on that in the sample code, it involves infinite recursion).

Sort the unsortable values!

It's now Friday at 5pm and you are enjoying your first beer of the evening. Ding! "JIRA BUG - Things randomly not working - REOPENED AGAIN". It's not mission critical and sounds like a job for Monday so you finish the beer and run out of the office before anyone notices. Turns out the feature was somewhat useful and was being incorporated to other sections of the application. Up until now the JSON blobs were fairly straight forward, all collections only contained primitive values. You are now looking at a stack trace stating you are trying to sort an object that is not an instance of comparable. Set<MyPojo> is now part of the object in question. You decide its better to hack the deterministic object mapper instead of forcing the POJO to be comparable incase it needs a different comparable implementation in the future. Now every POJO that is run through the deterministic ObjectMapper needs a custom comparator. Is it ideal? No. But it works.

Sort the values better!

Ding! Do you need to guess what bug was just reopened again? Upon even further investigation you find another exception. String cannot be cast to Integer What? You start digging through the sorting implementation you hacked together and notice that you take the first element from the collection and if it is Comparable you sort the collection. If it is not Comparable you use the passed in custom Comparator implementations. Everything is still working and all unit tests still passing. You find the suspect code Set<Object> troll which has a JSON value of "troll": [1, 2, "three"] . Immediate facepalm. Since there was actually a use case for this you need a work around. Lightbulb! You don't actually care about the sort order, just that the order is deterministic. You decide to sort all collections first by class name then by its Comparator . Brilliant!

The sample monstrosity of a semi-determinsitic ObjectMapper

There may be, and hopefully is a better way to do this but for now.

public class DeterministicObjectMapper { private DeterministicObjectMapper() { } public static ObjectMapper create(ObjectMapper original, CustomComparators customComparators) { ObjectMapper mapper = original.copy() .configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true) .configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true); /* * Get the original instance of the SerializerProvider before we add our custom module. * Our Collection Delegating code does not call itself. */ SerializerProvider serializers = mapper.getSerializerProviderInstance(); // This module is responsible for replacing non-deterministic objects // with deterministic ones. Example convert Set to a sorted List. SimpleModule module = new SimpleModule(); module.addSerializer(Collection.class, new CustomDelegatingSerializerProvider(serializers, new CollectionToSortedListConverter(customComparators)) ); mapper.registerModule(module); return mapper; } /* * We need this class to delegate to the original SerializerProvider * before we added our module to it. If we have a Collection -> Collection converter * it delegates to itself and infinite loops until the stack overflows. */ private static class CustomDelegatingSerializerProvider extends StdDelegatingSerializer { private final SerializerProvider serializerProvider; private CustomDelegatingSerializerProvider(SerializerProvider serializerProvider, Converter<?, ?> converter) { super(converter); this.serializerProvider = serializerProvider; } @Override protected StdDelegatingSerializer withDelegate(Converter<Object,?> converter, JavaType delegateType, JsonSerializer<?> delegateSerializer) { return new StdDelegatingSerializer(converter, delegateType, delegateSerializer); } /* * If we do not override this method to delegate to the original * serializerProvider we get a stack overflow exception because it recursively * calls itself. Basically we are hijacking the Collection serializer to first * sort the list then delegate it back to the original serializer. */ @Override public JsonSerializer<?> createContextual(SerializerProvider provider, BeanProperty property) throws JsonMappingException { return super.createContextual(serializerProvider, property); } } private static class CollectionToSortedListConverter extends StdConverter<Collection<?>, Collection<?>> { private final CustomComparators customComparators; public CollectionToSortedListConverter(CustomComparators customComparators) { this.customComparators = customComparators; } @Override public Collection<? extends Object> convert(Collection<?> value) { if (value == null || value.isEmpty()) { return Collections.emptyList(); } /** * Sort all elements by class first, then by our custom comparator. * If the collection is heterogeneous or has anonymous classes its useful * to first sort by the class name then by the comparator. We don't care * about that actual sort order, just that it is deterministic. */ Comparator<Object> comparator = Comparator.comparing(x -> x.getClass().getName()) .thenComparing(customComparators::compare); Collection<? extends Object> filtered = Seq.seq(value) .filter(Objects::nonNull) .sorted(comparator) .toList(); if (filtered.isEmpty()) { return Collections.emptyList(); } return filtered; } } public static class CustomComparators { private final LinkedHashMap<Class<?>, Comparator<? extends Object>> customComparators; public CustomComparators() { customComparators = new LinkedHashMap<>(); } public <T> void addConverter(Class<T> clazz, Comparator<?> comparator) { customComparators.put(clazz, comparator); } @SuppressWarnings({ "unchecked", "rawtypes" }) public int compare(Object first, Object second) { // If the object is comparable use its comparator if (first instanceof Comparable) { return ((Comparable) first).compareTo(second); } // If the object is not comparable try a custom supplied comparator for (Entry<Class<?>, Comparator<?>> entry : customComparators.entrySet()) { Class<?> clazz = entry.getKey(); if (first.getClass().isAssignableFrom(clazz)) { Comparator<Object> comparator = (Comparator<Object>) entry.getValue(); return comparator.compare(first, second); } } // we have no way to order the collection so fail hard String message = String.format("Cannot compare object of type %s without a custom comparator", first.getClass().getName()); throw new UnsupportedOperationException(message); } } }

Out of Scope

This only covers basic collections and several caveats. If you use Arrays, Iterators or anything else you may need even further customization.