With Alluxio added, the weight of missing an opportunity to move data from s3 to Spark was much lighter. Alluxio helped us in two meaningful ways. First, Alluxio could mount our s3 paths, and most of the state changes that occurred within the bucket didn't need to be orchestrated by our cache warming system. Secondly, reading data from Alluxio rather than s3 was, on average, 10x faster. In effect, if we didn’t move data from Alluxio to Spark in an optimized time sensitive way, it wasn’t a big deal because those tables could be materialized quickly. At first, we didn't do much to try to optimize our Alluxio usage, but to optimize cost and avoid caching several TBs of data, we mounted file paths for some of our heaviest usages easing the cognitive and computational overhead of warming caches for users with unpredictable patterns. For users who didn't use the platform as much, we used a cache warming system that warmed the cache and created Spark tables when users were likely to use them and destroyed the cache otherwise. This change helped us reduce the overall size of our cluster, and we gained a comprehensive understanding of usage patterns over time. Additionally, with the ease at which adding and dropping Alluxio nodes and rebalancing the data led us to experiment with Alluxio for our use cases without compromising user experience.

While it took a while to feel comfortable with our cache strategy, once refined our p99 latencies reduced to 7 seconds. Further, 98% of all queries happened with Alluxio rather than S3 directly. The 2% of queries that happened against S3 were new data sources not yet mounted by Alluxio or optimized by our cache warming system.