Unknown

that they want to become data engineers and provide career paths and opportunities for them to evolve in that role. And I hope other companies would do the same. So going back to a question around kind of what's the database? What are those fundamental organizational constructs and technical constructs that we need to put in place, I'm going to talk about the fundamental principles, and then hopefully, I can bring it together in one cohesive sentence to describe it. The first one fundamental principle behind data mesh is that data is owned and organized through the domains. For example, if you are in, let's say, in a health insurance domain, the claims and all the operational systems and you probably have many of them, that generate raw data about the claims that the members have put through that raw data should become the first class concern in your architecture. So the domain data, the data constructed or generated around a domain concepts such as claims, such as members, you know, such as clinical visits, these are the operational, these are the first class concerns in your structure, and in your architecture, which I call them domain data products in a way, the second and that comes from, you know, domain driven kind of distributed architecture. So what does that mean? That means that, at the point of origin systems that are generating the data, they're they're representing the facts of business as we are operating the business, such as events around claims, or perhaps even historical snapshots of the claims, or some current state of this claims. As they're providing those teams, the teams that are most intimately familiar with that data, are responsible to providing that data in a self serve consumable way to the rest of the organization. So the ownership, the one of the constructs of principles is that the ownership of data is now distributed. And it's given to people who are best suited to know and own that data. So that ownership can happen at multiple places, right, you might have your source operational systems that would now own a new data set or streams of data, whatever format is most suitable to represent that domain to own that data. And I have to clarify that that broad data that those you know, systems of origin generate, we're not talking about their internal operational database, because internal operational databases and data sets are designed for a different purpose and intent, which is make my system work really well. They're not designed for other people to get a view of what's happening in the business in that domain, and capturing those facts and realities. So it's a different data set, this is different, whether it's a stream, very likely, or it's a time series of whatever format is, is the data that is native data sets that are owned by systems operation, and people who are operating those systems. And then you have data sets that maybe there are more aggregate views, for example, in a domain for, again, health insurance as an example, you might want to have predictive points of intervention, or you know, critical points of contact, that you want the care provider makes, you know, contact with member to make sure that they getting the health care that they need at the right point in time, so that they don't get sick and make a lot of claims on insurance at the end of the day. So that domain itself that is responsible for making those contacts and making those smart decisions and predictability as when and where I need to contact a member, they might produce an aggregate view of the member, which is the historical, you know, records of all the visits that the member is done, and all the claims that says a joint aggregate view of the data. But that data might be useful not only for their domain, but other other other domains. So that becomes another domain driven data that that the team is providing for rest of organization to support. So that's kind of the distributed domain aspect of it. The second principle behind that is for any for data to be really treated as asset for for it to be in a distributed fashion, be consumed by multiple people and still be joined together and filtered and been in a meaningful we aggregated and in a self serve way us, I think we need to bring product thinking to to the world of data. So that's why I call this things kind of domain driven, or domain data products, product thinking in a technology space, what does that mean? That means I need to put all the technology, you know, kind of characteristics and tooling, so that

I can delight the, you know, I can provide a delightful experience for people who want to access the data. So these people might be you might have different types of consumers, they might be data scientists, maybe maybe they're just, they want to just analyze the data and run some queries to see what it's they, they may may not want it, they may want to use that data to convert it to some other, you know, easy to understand way of data to, you know, kind of spreadsheet so that you have this diverse set of consumers for your data sets. But for them for this data set to be considered a product, a data product and bring product thinking, you need to think about, okay, how I'm going to provide discover ability. So that's the first step, how can somebody find my data product. So it's a discovery ability, how can the addresses so they can programmatically use it, standards stability, I need to make sure that I put enough security around it so that whoever is authorized to use it can use it, and they can see things that they should see and not see things they shouldn't see. So the security around it. And well, for this to be self serve as a product, I need to really have good documentation, maybe example data sets that they can just play with and see what's in the data, I need to provide the schema so they can see structurally what it looks like. So all of the tooling around kind of self describing and supporting kind of the, again, the understanding what the data is. So there is a set of practices that you need to apply to data. So for data to become an asset, self. And finally, the third, I think discipline that intersects of this discipline would help the Dana mesh is the platform thinking part of it. So at this point of conversation, usually people tell me Hold on a minute, you've asked me now to have independent teams, all of my operational teams to own their data, and also serve it in such a self serve, you know, easy to use way, that's a lot of repeatable, you know, metal work that these teams have to do to actually get to the point that they can provide the data like all the pipelines that internally they need to build, so that they can extract maybe data from the databases or have a new, you know, event sourcing in place. So that leads into, you know, transmitting or emitting the events, there is a lot of work that needs to go documentation discoverable at, how can they do this, this is a lot of costs, right? So that's where it kind of the data infrastructure or platform thinking comes to play, I see a very specific role in a data mesh in this mesh of, you know, domain oriented data sets for infrastructure for what I call self serve data infrastructure. So all the tooling that we can to put in place on top of our raw data infrastructure, so the raw data infrastructures, you know, your storage and your