James Cunningham

Absolutely. So this is this is my time to shine.

So one of the things that I kind of had to had to make a concession Is that I've never worked with a database that possibly be bound by CPU. It's always been, you know, make sure that your disks are as fast as possible, you know that the data is on the

disks, you got to read from the disk.

And the reason that you know, it very well could be bound by CPU is that, you know, I've seen compression in the compression in the past, and I didn't really understand what compression could actually give you until we returned click house on sort of compression realistically, you know, brings our entire data set, you know, we kind of alluded to it earlier, brings our entire data set from 52 terabytes data, two terabytes, and about 800 gigs of those are surprisingly uncompressible because they're unique, you know, 32 character strings. If anyone can tell me a, an algorithm that helps compress that, I think that we made a TV series around that or something like that, but you know, for the for the right The rest of the data, it's so well compressed that being able to actually like compute across it does so well, you know, we, we run a small amount of servers to supply what is a large amount of a data set? You know, we've, we started, I wouldn't say that, like, if there was any advice to anyone out there, start by sharding. Never Never shard by two, because two is a curse to number in terms of distributed systems. But we really just started with, you know, three shards, three replicas. And you know, with that, with that blessed number of nine, we haven't gone up yet. We kind of have a high watermark of a terabyte per machine. Google gives a certain amount of read and write off that disk based on how much storage you have. And we've kind of unlocked a certain level and one terabyte for a machine on if anyone else is somehow running click house on GCP I guess on GCP that is, you know, we're we're about to apply our fourth shard. But realistically, some of the other things that are operationally sound is That, you know, as as much as we'd all love to, I guess like hammer on or praise XML. It is it is very explicit about about what you have to write in. Its configured via XML. There's no runtime configuration that you're applying. There's no you know, magic distribution of writing into an options store and watching that cascade into a cluster

auto scaling.

Yeah, I'm not I'm not, you know, crunching in any Kubernetes pods or anything like that. One of the things I'd be remiss to not say is that you did mention CloudFlare is running click house and shut out CloudFlare they run real hardware and I'll never do that again in my life. But uh, one of the things that they alluded to and one of their kick ass blogs about click house is that it replicates so fast that they found it more performance that when a disk in a like raid 10 dies, they just wipe all the data, rebuild the disk essentially empty and just have click house refill it itself. It is crazy fast in terms of rough application. Since all that is compressed, it really just sends that across the wire. Some of the other stuff that, you know, we found completely great in terms of operationalize is that since it is CPU bound, it's mostly by reads when you are right heavy company, and you're now bound by reads in terms of cost of goods sold, like, I can throw around a million high fives after that. It's great to just watch, you know, people log in and actually look at their data and watch our graphs tick up, instead of just saying, Well, you know, we spent a lot of spend a lot of money on this, and people are only reading, you know, 1% of their data. One other piece that I'd be remiss to not answer is that some some niceties about click house that kind of separated for a few of the databases I've worked with is that the ability to kind of set some very quick either like throttling or kind of like turbo ng settings that you have on a client side. So some of the things that we might do is that if we know that a query is going to be expensive, we could you know, sacrifice a little bit of resources and Kind of like turn it back fast. So there is just a literal setting that is Max threads where I say, you know what, I really want this to run faster set max threads to eight instead of four. And it does exactly what it says it does, it'll run twice as fast if you have it twice as many threads. So they're pretty easy things that we kind of run around in terms of operational wise, I think that as far as a database goes, you know, one of the hardest things to do is just kind of read all of the settings to figure out what they do. But after you kind of get versed in it, you'll understand you know, what applying this setting might be or at what threshold, you might set something, and it's not very magical, you know, some of these settings, realistically are for very explicit types of queries that you'd only supply from a client side if you really needed them. So fairly, I wouldn't go so far as a simple like the configurations almost like dumb, and then either straightforward, very straightforward. Yeah.