Unknown

Yeah. So we see a number of changes and other issues come up. Often. Typically, when people have moved from a non data warehousing world, maybe working with more traditional warehouse on database, like my sequel or sequel server, I think a lot has changed about how to think about this stuff. This was definitely something that we were very used to a Google data should be. So source data should generally be immutable. When you're reading data from logs, transaction logs or event logs from your website. And that data should ideally never change. And similarly, when you're reading data from a database, you don't you ideally want to read that in a way that you have a full history of everything that has changed. So when we come to transforming the data Producing derived data sets, which is what you're primarily doing with the products like data form, you want to be able to build those dr data sets from scratch. If you lost them all, if you accidentally deleted all of those data sets, you want to be able to recreate that entire warehouse state again. And we often see the first people thing people do when they start using our product is actually write more like sequel scripts which perform some sequence of operations. So creating tables, inserting into those tables, deleting from the tables, this is kind of staple, and it becomes very hard to reason about. And I believe drew mentioned this on the DVD podcast where ideally you want to be a bit more declarative everyday search should ideally be a select. There are some situations where this doesn't work. GDPR probably being the main issue there. But because you can get to having your datasets being declared in that way and to not mutate state or instantly rose from these areas to reason about them. Make sure that your entire data pipeline is reproducible. So coming up to some other things that we see, I think, sequel adoption is growing. And I think the power of modern warehouses is enabling that. We definitely see some people still kind of falling back to you processing data with Python, dumb pie are. There's a problem with scalability here. This is that simply never going to be as fast as doing a comparable thing in sequel, they're usually limited to a single machine or some ram limitations. Whereas you can come to be process terabyte data sets with the curry or snowflake. And so that's, that's a habit that I think we've had see. I know myself as a software engineer, I want to fall back to what I know and what I understand. But moving this stuff into sequel using ETFs, where you can actually opens up a lot of possibilities when it comes to schema evolution. I think a lot of these Challenges go away if you get these first things, right. So having a mutable source data sets and having reproducible transformations of those datasets, and having all of your code in one place, makes it much easier to make massive changes across those systems. Because you remove the state, it's much easier to change a data set and all its dependencies in a single pass, and did it fit scheme, evolution becomes a bit less of an issue. At some point, scale does become an issue. And you can start looking at approaching those schema changes in a backwards compatible way. And this is similar to how you might approach this problem in software engineering. For example, we have many different micro services, they might get released on different schedules and you want to make sure that they continue, they can continue to talk to each other as that software rolls out. So for example, removing a field from a table because it's no longer valid. You typically do that in a software engineering way by marking that field is different. gated giving those consumers some amount of time to clean up their while in this case sequel queries that use that data. And then when you know all consumers are no longer reading it, you can remove it safely. tools like data for make this kind of action a bit more feasible to accomplish because you can see you have that sequel in one place, you can actually analyze and work out who's reading this field, who do I need to notify? And you know, when all references to it has been removed,