Using CQRS/ES for Content Management Systems (CMS)

The CQRS/ES pattern pair has been discussed widely in mainstream PHP development circles from mid 2015. There are a number of libraries and the pattern is being applied in at scale in production environments. So the concepts here are certainly mature and ready. Rightfully, many warnings are issued for not using CQRS/ES together for everything. But let's now consider how relevant it could be for the bulk of web development, using Content Management Systems (CMS).

As with many things in software development, these patterns are hardly new. Designs simply get discovered again. The Command Query Responsibility Segregation (CQRS) pattern aims to provide a separate model for reading and writing your data. You might have already done exactly this. Completely intentionally or unaware of the hype.

One key targets of adopting CQRS is to make writing very fast and straight forward. You could simply collect all the updates as a simple events, from which you generate separate, potentially radically different, aggregated models for read operations.

Event Sourcing (ES) is another pattern that is often mentioned together with CQRS:

Storing all the changes (events) to the system, rather than just its current state.

Essentially this means collecting all the events (create, update, delete...) to a data store in their raw form. They cannot be removed and are undone with a new event.

This procedure is similar to accounting, where ledger entries should be immutable (written with a pen), rather than a mutable (written with a pencil). If you made a mistake, you should create two new events - first to revert the mistaken entry and then add the correct event. Each update will only contain individual changes made.

Armed with a complete log, you can reconstruct the state of your data at any single point in time. The end result is similar to the time travelling that the Redux state management library in JavaScript. Not only is it neat, but potentially very useful.

A complex content model is not a CRUD

Content Management Systems (CMS) are tools that allow management of content entities. Most CMSes store data in some sort of database, whether it's a RDBMS like MySQL storing XML or a document storage like MongoDB with content in Markdown.

This crop of tools largely treats the data as a CRUD storage that gets updated. For more advanced usage they apply various techniques to apply version management as well as multilingual capabilities, versioning, audit trails and countless other things.

For most uses of a WCMS (Web CMS) these techniques have worked great for two decades. So it's a proven model, but it still might make sense to consider CQRS/ES for parts of your storage engine. For some use cases the benefits of the added complexity of separate read and write models may outweigh technical simplicity.

Let's consider some CMS features that could benefit from applying CQRS/ES:

Version Management : Instead of maintaining a large pool of redundant entries in your database for a number of different states, you could generate projections of the versions upon request or possibly asynchronously in advance.

: Instead of maintaining a large pool of redundant entries in your database for a number of different states, you could generate projections of the versions upon request or possibly asynchronously in advance. Language Versioning : As with version management a CRUD database ends up containing redundant data. Instead you could again create a projection for a specific language version(s) that is easy to read and offers high performance.

: As with version management a CRUD database ends up containing redundant data. Instead you could again create a projection for a specific language version(s) that is easy to read and offers high performance. Workspaces and Content Staging : Workspaces could be used to maintain a close-to-identical version of a site that is having a new section added to it. With CQRS/ES preview would be a separate projection and would merge to production.

: Workspaces could be used to maintain a close-to-identical version of a site that is having a new section added to it. With CQRS/ES preview would be a separate projection and would merge to production. Content Variations : Delivering personalized content is already widespread. Automatic recommendation engines, manual selection of a content variation for a specific geographic region for example, could be presented with a projection.

: Delivering personalized content is already widespread. Automatic recommendation engines, manual selection of a content variation for a specific geographic region for example, could be presented with a projection. Audit Trail: Many content management systems use an audit trail log to track who did what and when to keep track of the past states. Technically a complete audit trail could be used as an event store to play back events from the history.

Any single one of these is doable with traditional CMS storage methods, but if you want to start combining the projections to form let's say "give me all content that was shown yesterday at 12:00 PM to visitors coming from France to our beta site". With the collection of raw event data, it is feasible to create such a view into the past.

From the above examples you can guess that your data storage will actually be polluted with a large number of projected data with duplication. This is ok. Many tools like eZ Platform already build an use a separate query data model in a search index. With a proper application architecture in place this is reliable and extensible.

It's also worth noting that you're not tied to a single type of data store. You can scale and choose technologies as they fit best. If your events are stored in a robust RBDMS, most will want their aggregates/projections too for simplicity. But there is nothing stopping you from populating your read model to a Graph Database like Neo4j asynchronously with a message queue.

Not so fast... does it reaaallly make sense?

As in content management in general, the devil is in the details. This is why the above ideas only scratch the surface and give ideas on how CQRS/ES could be applied to a content repository of a CMS. There are a lot of underlying complexities in play here and one should not adopt any concepts lightly, let alone something quite radically different from the traditional model.

If you are creating a content management system, your database schema (or lack of) has the potential to follow you you for a long time. If you're lucky and your CMS takes off to become the next WordPress (in terms of popularity), you can be living with this schema for the next fifteen years or so. During this time conversational interfaces and other unforeseen technologies will be tapping into your content repository.

CMSes are used everything underneath the sun, so building the perfect storage layer for this is virtually impossible. If you're planning on using CQRS/ES for your database schema, then ideally I would prototype with as much as possible with data and complex queries from actual projects. Ideally with millions of content objects and rich semantics to make sure your schema has wings.

Speaking of schemas, the database dumps could contain only the events and the projections could be generated on demand or pre-generated upon publishing / editing - similar to Solr, etc. integrations in systems. Many have also created similar ad-hoc search indexes, but they're often limited in capability lacking permissions checking and so forth.

The performance of generating projections is also important. as at worse you could architect a funky CQRS/ES system, that's simply unusable because creating projections is too slow. This needs to be thought of carefully before getting drunk up on the possibilities offered by a complete history since the dawn of time.

Conclusion

So there's definitely lots of opportunities and challenges in using the CQRS/ES pattern as a data store for content management system. Whether it works for you depends on your use cases and, to put it bluntly, pure technical capability to deliver complex application architectures.

As a Symfony developer the ideal opportunity to learn is at the SymfonyCon held in Berlin in early December 2016. There is a session on CQRS/ES from Samuel Roze:

The Command Query Responsibility Segregation pattern, instead of the traditional CRUD, introduce different models for reading and updating the application states. Event Sourcing is the idea that every state of your application can be represented by a sequence of events. Using these two principles as the heart of a system or an application is quite common but can be challenging if we don’t use the right tools or architecture. With a concrete application as example, we’ll go through the architecture, libraries and bundles we can use in a Symfony application in order to apply these patterns.

A CQRS and Event Sourcing approach in a Symfony application

While waiting for the event, there are obviously tons and tons of good resources online that are worth discovering. Here are a few to get you started:

Written by Jani Tarvainen on Tuesday October 4, 2016

Permalink -