Users and Traction

1 million apps were deployed to Parse.

were deployed to Parse. The largest Parse app had 40M users.

The largest Parse customer only used it for Push notifications

Parse was one of the world’s largest MongoDB user

Clash of Kings used Parse for push notifications and made up roughly half of all pushes that went through Parse. They never moved any other parts to Parse, due to scalability concerns.

and made up roughly half of all pushes that went through Parse. They never moved any other parts to Parse, due to scalability concerns. Original reason for Facebook to acquire Parse was to push their mobile SDKs and to create synergies with mobile ads. Parse was often sold as a package deal with Facebook advertising.

and to create synergies with mobile ads. Parse was often sold as a package deal with Facebook advertising. Static pricing model measured in guaranteed requests per second did not work well.

did not work well. Business problem: people tended to remain in the free-tier .

. Technical problem I: complicated rate limiting . If limit exceeded by a factor of 60 for a minute, requests were dropped. Limits were tracked using a shared Memcache instance. Consequence: when developers experienced rate limits in the API, they added retries. The retries incurred enormous load in the Parse backend.

. If limit exceeded by a factor of 60 for a minute, requests were dropped. Limits were tracked using a shared instance. Consequence: when developers experienced rate limits in the API, they added retries. The retries incurred enormous load in the Parse backend. Technical problem II: the real problem and bottleneck was not the API servers but almost always the shared MongoDB database cluster.

Parse Server

Server was Rails at first (with 24 threads max. concurrency) with very little throughput per server (~15–30 requests per second)

at first (with 24 threads max. concurrency) with very little throughput per server (~15–30 requests per second) The server was later rewritten in Go . The open-source Parse server is written Node.js and lacks many functionalities of the original Parse server in Go.

. The open-source Parse server is written Node.js and lacks many functionalities of the original Parse server in Go. Backend was completely on Amazon Web Services

It was planned to migrate Parse to Facebook’s infrastructure (e.g. Haystack, Tao, F4, Extended Apache Giraph, Gorilla) but the project was abandoned

(e.g. Haystack, Tao, F4, Extended Apache Giraph, Gorilla) but the project was abandoned Roughly 8 developers working on SDKs, 8 on the server, 8 DevOps + a few more engineers

Database

>40 MongoDB Replica Sets with 3 nodes each

Parse went for RocksDB as their primary storage engine.

Storage Engine: RocksDB (i.e. MongoRocks), an append-only engine based on log-structured merge trees (similar to e.g. Cassandra, HBase, CouchDB, LevelDB, WiredTiger, TokuDB). Reason: there is better handling of many collections — in contract to WiredTiger that uses one file for each collection. Compression was better by a factor of 2–3 in terms of space. Writes and replication also were more efficient in terms of latency/lag. The move to MongoRocks from MMap was done by adding a replica with MongoRocks that was later promoted as the new master.

(i.e. MongoRocks), an append-only engine based on log-structured merge trees (similar to e.g. Cassandra, HBase, CouchDB, LevelDB, WiredTiger, TokuDB). Reason: there is better handling of — in contract to WiredTiger that uses one file for each collection. Compression was better by a factor of 2–3 in terms of space. Writes and replication also were more efficient in terms of latency/lag. The move to MongoRocks from MMap was done by adding a replica with MongoRocks that was later promoted as the new master. Used only instance storage with SSDs , no EBS.

, no EBS. No sharding: each tenant was mapped statically to exactly one replica set using MongoDB’s primary database logic.

using MongoDB’s primary database logic. The Mongo Write Concern was 1 (!), i.e. writes were confirmed before they were replicated. Some people complained about lost data and stale reads

(!), i.e. writes were confirmed before they were replicated. Some people complained about lost data and stale reads Slave reads were allowed for performance reasons

were allowed for performance reasons Partial updates were problematic as small updates to large docs got “write amplification” when being written to oplog

were problematic as small updates to large docs got “write amplification” when being written to oplog Frequent (daily) master reelections on AWS EC2. Rollback files were discarded and let to data loss

on AWS EC2. Rollback files were discarded and let to data loss Developed a special “ flashback” tool that recorded workloads that could later be rerun for internal load and functional testing

that recorded workloads that could later be rerun for internal load and functional testing JS ran in forked V8 engine to enforce 15 second execution limit for user-provided code

to enforce 15 second execution limit for user-provided code No sharding automation : manual, error-prone process for largest customers

: manual, error-prone process for largest customers Indexing not exposed : automatic rule-based generation from slow query logs. Did not work well for larger apps.

: automatic rule-based generation from slow query logs. Did not work well for larger apps. Slow queries killed by cron job that polled Mongos currentOp and maintained a limit per (API-key, query template) combination

killed by cron job that polled Mongos currentOp and maintained a limit per (API-key, query template) combination Backups : if important customers lost data due to human error, Facebook engineers would manually recover it from periodic backups

: if important customers lost data due to human error, Facebook engineers would manually recover it from periodic backups The object-level ACL system was highly inefficient. Numerours indexes were required that could sometimes surpass the actual data size by a factor of 3–4.

was highly inefficient. Numerours indexes were required that could sometimes surpass the actual data size by a factor of 3–4. As there was no mechanism for concurrency control (except for minimal support for things like counters), applications were often inconsistent

What Parse should have done differently

Parse did a lot of things right. The documentation was great, the mobile SDKs were solid and the web UIs well-designed. However, they had an unspoken value system of not trusting their users to deal with complex database and architectural problems.

Coming from a database background, our idea is that developers should know about details such as schemas and indexes (the Parse engineers strongly agreed in hindsight). Also, we think that backend services are not limited to mobile apps but very useful for the web.

I think that providers should be open about their infrastructure and trade-offs, which Parse only was after it had already failed.

If this idea sounds interesting to you, have a look at Baqend. It is a high-performance BaaS that focuses on web performance through transparent caching and scalability through auto-sharding and polyglot persistence.