Frank McSherry

Right. Great question. So, so what happens at the moment, and I should say, the offices are subject to change as we learn that, that people hate It or Love it or want something slightly different but but the way the world works at the moment in materialize is that we presume that you have some, let's say, my sequel or something, you have a well accepted source of truth database. And at the same time, somewhere nearby you have let's say, Kafka, a place to put streamy ish ish data that has persisted performance. There's a tool out there that we that we use and so recommend people use at the moment called the museum that attaches to greatest databases as essentially like a read replica or reads the bin log or as a few different strategies. Based on the databases, and you turn this thing on, you pointed at you pointed at your database. And it starts emitting a couple of topics for each of the relations that you've named, while you when you turned on when you turned out to be easy, and the topics that prism to cover basically contain before and after statements about various rows and various relation. So I say like, you know, change it up in a row used to be this before now it is this afterwards, and the timestamp and what we do you turn on materialize, and dematerialize, you start typing things like create source from and you announced the topic there. And when you announce a topic there or a pattern for a class of topics, materials will open up all these topics, start reading through each of them, and start pulling in these changes and presenting each of the topics now as a relation that you can query. start mixing and matching all of these different queries together. So this is this is roughly sorry, this is half answered the question. I'm going to continue this one but this is this is roughly what you need to do to get started with With materialize, you have relational database that's holding on to your data, you transform it using a tool like the museum into a change log of with a particular structure and Kafka. And then, while using material just pointed at the Kafka topics, and we'll start slipping into it for you in terms of how do you avoid being an entire replica of the entire database, and in particular, the full history of all the changes changes to the database. So what you can do in materialize is you obviously have the ability to select subsets of the relations that you want to bring in. I've got the ability as you bring them in to filter down their relations to based either on predicates or projections to filter down just to the data that you need. So if it turns out that you're only interested in analyzing customer data and sales data, and on the five of the columns from it, you're more than welcome to slice those things down and materialize has through differential data flow, which it fills some compaction technology internally that that just makes sure that we're not using any more footprint than the size. So the resident size of whatever relation here you're sitting on, so Although the history might go back days, weeks, whatever we don't actually let unless you, you ask for it. We don't need to keep the days and weeks long history around. And we'll just give you the answers on now going forward. It's flexible, though, like you could in principle, say, please load the whole history in and don't come back to me, but at which point, we look a bit more like a temporal data processor. So we'll show you the full history of your query. Going back as far as we have, as far as we have history, basically empty instrument back. But but it is the case definitely, that if you have, let's say, you know, few gigs of data that you're planning on analyzing interactively, we will have a few few gigs of data live in memory. If your data is 10 terabytes, and you want to just do a random access to it and play around with it, we're going to try to pull in 10 terabytes of data and we might need to tell you about the cluster mode at that point, or try to give you some advice on on pinning down the records a little bit so that you don't have quite so much of a footprint, but material is going to manage all of its own data, it's not going to return to the core database, and dump any of the analytical workload back on it. So we're marrying this stuff so that we can we can handle all of our over workloads without either interfering with things upstream or without finding ourselves off footed and not actually having the data we need to index correctly. We'll see how that how that works. Right? At the moment, this has been fine. Like a lot of people who talk to us when they actually tell us what do I need interactive access to it, it's a surprisingly smaller volume of data that everything that kept in their source of truth