Data integration is one of the most challenging aspects of any data platform, especially as the variety of data sources and formats grow. Enterprise organizations feel this acutely due to the silos that occur naturally across business units. The CluedIn team experienced this issue first-hand in their previous roles, leading them to build a business aimed at building a managed data fabric for the enterprise. In this episode Tim Ward, CEO of CluedIn, joins me to explain how their platform is architected, how they manage the task of integrating with third-party platforms, automating entity extraction and master data management, and the work of providing multiple views of the same data for different use cases. I highly recommend listening closely to his explanation of how they manage consistency of the data that they process across different storage backends.



Alluxio provides an open source unified data orchestration layer for hybrid and multi-cloud environments, making data accessible wherever data computation and processing is done. By seamlessly pulling data from underlying data silos, Alluxio unlocks the value of data and allows for modern data-intensive workloads to become truly elastic and flexible for the cloud.

Want a free Alluxio t-shirt? Sign up below and we’ll send one to you!

Notice: JavaScript is required for this content.

strongDM enables you to easily manage and audit access to databases and servers. Leading organizations including Hearst, SoFi, and Peloton rely on strongDM to eliminate the manual-heavy work required to onboard, offboard, and audit staff’s access to everything. Simplify your access control strategy today and schedule a demo to see how much easier your life can be.

Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $60 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!