Todd Walter

Well, that's a big question. There are people in the industry who say that the data lake is a replacement for the data warehouse. I do not believe that Teradata does not believe that. However, we strongly believe that the data lake has a really important role to play in the overall analytic data platform architecture. There are there are people that make this an either or conversation. And, and we just don't believe that at all. We believe that the the data lake and the data warehouse should work together symbiotically to do to deliver the the data and deliver their in their separate capabilities to the organization. So the data lake is really good at collecting this really high volume data and doing big grind them up app operations over that large volume of data. So the big curation steps, the big processing of sensor data, for instance, to to put it into a format that analysts can use and, and and normalize units and normalize time and all these big, these are big, heavy lifting operations on this data. And, and those are really great things to do in the database. while delivering

access to the highly curated data of the organization is a is something that data warehouses do very well. And each of them are bad at doing the other thing. So data warehouses are bad are bad at doing the really heavy lifting on the semi structured or weekly structured data that raw data that's coming in. And and the data lake technologies are are weak at providing SLA is on high concurrency workloads to support a whole organization. And so we really think that the two should work together. And that we think there's a natural flow of as the data is more and more curated, it is more and more likely to belong in the data warehouse to be delivered to a much wider group of people sharing it across the organization, rather than the rather than the exploratory users who are doing the initial analysis and initial understanding, and the end the big heavy lifting curation processes. We also think that the data lake is a virtual concept, in that a file system is a great place to land data that is weekly structured, the text, the IoT data, the web log data, that all of those all of those kinds of forms of data that are that are the new big data sources of the of the world. But the when the data is coming in a more rolling column form, landing that and formatting it as as unstructured files and then restructuring it back into a forum for the widespread use in the organization is an extra hop and an extra extra energy, extra resources used that don't really need to be used. The your comments or questions about ETLLTETLTTLTXX there's an end at at strata last week, I heard a new one, which I really liked. ELE, it's, it's around the concept of a data hub where you extract load and then extract again to feed out to a very large number of sources. And one of the one of the presenters was bemoaning the fact that all he did was his entire life was Ellie, and just land data and then ship it back out again. And nobody actually ever used it in his in his data platform. But Teradata has always advocated the use of an LT kind of model. Just because it is the scalable model for the larger data sets. It's easier to do the transform processes with with a parallelized scalable set of operations. Rather than trying to push them through a server, you know, single threaded server somewhere and process them record by record that's fine for small data sets, but doesn't work for large data sets. And the LT model of course has been highly adopted by the by the data lake folks where a lot of the the curation is done on platform you leveraging the tools of the data lake environment, like you know, anything from MapReduce, pig hive, all the way up to, to spark and, and I'll Python scripts and everything else. But But the goal is the same this, the goal is to push the work into the scalable platform so that you can operate on the very large data sets and do the heavy lifting on the large datasets in a in a reasonable amount of time, schema and redone schema and right come back right back to the conversation from before about the lifecycle of of data in an organization and the lifecycle of data curation, schema. And read is really great when it's a small number of users who are exploring the data and trying to and trying to understand the structure and the value in the content. And trying to derive what they the interesting things or the interesting new insights out of that data is, that's a great thing to do. And every organization should provide in their processes, a way for people to land the data in a raw or the lightly curated way, and make it available in a schema and read kind of model to that small set of super users who can deal with that data in that form.

But when you need to get the data out to 10,000 users in 50 organizations, schema read no longer makes any sense at all. It is a huge resource utilization, because you're doing it over and over and over again, for every every use of the data. It it introduces all sorts of opportunities for each person to curate the data, or each application to curate the data in a different way and does get different answers. It introduces a whole bunch of, of problems that we're all the reasons why we did ATL in the data warehouse world in the in the first place. And so the more the data is used across the organization, the more production or data as a product the data becomes, the more curated it needs to be, the more it needs to be modeled. And the work done the curation work done once and then the curated data used many times by the by the people don't stream.