James Campbell

Absolutely. It's amazing how much things have evolved in in those two years. And especially in the last six months or so, just like you said, originally, our focus was very much on validation. And in supporting exploratory data analysis, and that sort of a workflow and also very much we were panda centric, there's been an evolution in a lot of different dimensions. First one is just in terms of the kinds of data that great expectations can interact with. From the very beginning of the library. We tried to ensure that expectations were always defined in a way that wasn't specific to pandas. It wasn't about any particular form of the data. It was really about the semantics of the data, how people understand it, what it means for the context of a particular analysis. So we've been able to realize that goal a little bit by expanding to now support sequel alchemy. and by extension, all of the popular big SQL databases. So we have users running great expectations on Postgres on redshift on Big Query. Also, we've expanded into Spark. So whether that's Cloudera clusters or Spark clusters that are managed by teams are data bricks, we've got users being able to evaluate expectations on all of those. And I think, again, one of the things that's really neat is actually it's the same expectations, right? So the same expectation suite now can be validated against different manifestations of data. So if you have a small sample of data that you're working with on your development box and pandas, and then you want to see whether that is the same expectations are met by a very large data set out on your Spark cluster. You can just seamlessly do that. The next big area that we've we've pushed is in terms of integrations. It was a it was a big pain point for users, I think to figure out how can they actually seem, you know, stitch Great Expectations into their pipelines. And so we've done a lot of work in creating what we call a data context, which manages expectations, sweets, manages data sources. And it can bring batches of data together with expectations, weights to validate them, and then store the validation results, put them up on Cloud Storage. So you could you know, have all your all of your validations, for example, immediately loaded s3 and available to your team. So that's been another big area of development. And man, again, I know I'm just going on and on here, but there are a couple other big areas. So one of them is, you know, been the thing that I think really people have been able to use to resonate with great expectations and see things move forward, which is data docs, we call it or the ability to generate human readable HTML documentation about the artifacts have great expectations, so about expectations, sweets about validation results, and So we basically generate a static site that you can look at. And it really helps you get a quick picture of what you you have in your data as well so that you can share with your team. And then the last area is profiling. We've done a lot of work to make it so that you can use Great Expectations the library before you've really zeroed in on what the expectations are. So it becomes this iterative process of refinement, where in an initial profiling, we basically say, you know, I expect these hugely broad ranges, I expect the the mean of a column values, for example, to be between negativity and positivity. Well, obviously, that's true. But as a result of computing, that we give you the metric, what the actual observed being is, and you can use that to, especially when you're combining that with documentation, profile and get a really robust understanding of your data right away. So there's a lot there. There's a lot of innovation and work that we've been able to do and it's been a really fun thing to get to focus more on the project. Yeah, the profiling in particular, I imagine is incredibly valuable for people who are just starting to think about how do I actually get a handle on the data that I'm using and get some sense of what I'm working with, particularly if they're either new to the project? Or if they've just been running blind for a long time and want to know, how do I even get started with this? Absolutely. I think one of my favorite things that I see in our Slack channel a lot is when somebody will say, you know, I ran this expectation and you know, it failing and I don't know why. And then they look into their data. And it's well, because it's not true. And and I just never, never cease to love that sense of surprise and an excitement that people have when they really encounter their data in a richer way or in a way that they hadn't seen it before. What profiling does is it just makes that happen across a whole bunch of dimensions all at the same time. Exactly. Right. I think more and more what we're finding is when a user is first getting started with great expectations, it was intimidating to sit down at a blank notebook and figure out Where do I go? Where do I get? How do I get started. And so now what they can do with profiling is start off with a picture of their data set. You know, they get to see some of the common values. And you know which columns are of which types and distributions, and it really gives them a way to dive right in. And then we can actually generate a notebook from that profiling result that becomes the basis of a declarative exploratory process. So we actually can sort of guide you through some of the initial exploration that makes sense, based on the columns and types of data that you have.