Karthik Ranganathan

Absolutely, yes. So we are a multi API database. Our query layer is pluggable, which means we can continue to add more and more access patterns in the future to help users build a richer variety of apps. So that's that was really the vision even from day one, we picked Cassandra specifically because the Cassandra language also uses a very SQL like dialect it's it also has tables, it has columns, it has insert and select queries and so on and so forth. So we use that as a building block and it has a rich ecosystem. It is good for a certain type of use cases like which are massive scale massive amounts of data reads and writes and ultra low latency which clearly complement the sequel, very relational use case. The thing that we changed from Apache Cassandra is that unlike Apache Cassandra y CQ L, the you gabite. Cloud query language is completely acid compliant, right? So we think of yc qL as a semi relational use case, and we about some of the dangers of scale out at massive scale, where if you issued a bad query or poor query, like you could really ruin not only your own life, but everybody's life in the cluster, because it all the nodes are performing a lot of work. And it'd be too late to, if take a while for the whole thing to settle down. And that could cause unintended consequences. And it's okay at a couple of terabytes is really bad at 10 or hundred terabytes. So the yc qL API restricts you from doing any of those queries by not even supporting them. So yc qL only supports the subset of queries in SQL that hit a finite number of nodes unrelated to the total number of nodes in the cluster. So there are no like, scatter gather type operations that do joins across all tables. And so it's really built for scale and performance. Right. So that's on the on the weisse eql. side. Now, where do we see these to fit in, if you look for workloads that are 10 200 terabytes or more right, and they need a very low latency access directly as the serving tier Have use cases such as time to live that you have to implement automatic data expiry with the features called Time To Live, why SQL perfectly fits the bill. It also supports compound data types, such as lists and maps and sets and so on inside like a single column on the other end, why CQ? Why SQL? The Postgres compatible API does foreign keys constraints triggers, like the whole nine yards, right on the completely relational side. Now, you asked about how we designed the layer below, right? Like, it was actually an interesting challenge for us like it is document oriented that way below. And what we figured was a document database is actually the most amenable to supporting a wide array of access patterns. As long as we can keep enhancing the storage layer, by the way, it's called doc dB, so I'll just use that term from now. So what we realized in the doc DB layer is that there's a number of access patterns that we have to optimize. And we have to leverage these access patterns in the corresponding query layers above right the advantage of a common DB layer below each is that the advantages of one start flowing into the other. For example, on the Y sequel API, we have the ability to store a lot of data per node, like one of our users actually tried loading 20 terabytes of data compressed per node. And then at that density level tried to do you know, hundreds of thousands of operations per second, and had tried to kill a node, add a node, expand the cluster so on right all of that seamlessly flows into the why SQL site right and why SQL side has, for example, features such as secondary indexes and constraints which we added to the Y SQL side. So developers coming in with the Cassandra knowledge and wanting to build those type of apps can actually use secondary indexes, unique constraints transactions, a document data type, JSON, V datatype, and so on. And the Y SQL folks, the Postgres folks wanting to do scale can actually leverage a Cassandra like scale. So it really marries the two at the layer below. Now, what is another unique advantage that's often overlooked is the fact that we Internally distinguish between a single rocchi access pattern and a distributed access pattern. So what this means to the end user is that like, if you went to a Google Cloud, you would put your most critical transactional workloads on Google Spanner. But Google Spanner uses atomic clocks is very expensive and has a lot of limitations. So you wouldn't put like use cases which have a ton of data in Spanner, you'd probably move it to something like a big table, right? So you go by brings both into the same database as just two different table types. So that's, that's really another huge advantage that the end user gets. Now as far as, as far as the challenges. I think that's that's actually an interesting question. I think the challenge always comes down to is twofold, right? Like first part is the addition of so many features into something that's core at the lower layer should not destabilize whatever exists, right. So that means and especially in something as fundamental as database, it's almost like a breach of trust. If we build a feature that brakes to something else and loses data, right? So so that means that the onus on testing is incredibly high, we have a super massive elaborate pipeline to test our product for every single feature matrix. And like we in fact go the distance of having a CI CD pipeline. Sure, very proud of that bids for spot instances, the minute somebody uploads a diff, a code, diff. So for code reviews, so the minute they upload their changes, we automatically bid for spot instances, and run spark based parallel tests like thousands and thousands of tests in parallel and before the review is done or the even the reviewer gets to it. Sometimes the results of what happened by running all this wide array of tests are out. We had to invest in doing thread based sanitizer address sanitizer, we had to invest in seed Lang and Mac and Linux and all sorts of different environments to build in Kubernetes, Docker so on and so forth. We have to do, we do jeppeson based testing, we do determine Failure is not determine. So we have a like, it's a very, very elaborate pipeline. So that's a big onus. I mean, but we still we, some of us actually enjoy working on that stuff, believe it or not so.

So, so that it works out as a team. So that's one part. The second part is People often ask us what we're going to do for compatibility with like, for example, Apache Cassandra or with Postgres, right. So the way we think about it is slightly different. We will do the compatibility slowly. Like that's not a concern for us. What is more important is enabling users to be able to build the type of applications they want to in the here and now instead of chasing versions, so we're not going after lift and shift off an application. We're going after lift and shift have an application developer. So a user that's familiar with Apache Cassandra, but really wants secondary indexes. I just wish I had JSON. I just wish I could do a couple of transactions here. Those are the guys we're going after, because we're really enabling a new paradigm of apps to get Built on a database that is not a new paradigm to them, right? So similarly, the Postgres folks, like, all of this is great, but I just wish I had the scale or I had the Ha. So those are the things that we're going after. So yeah, I think, I don't know if that gives a fair idea.