Much of the Internet’s security posture relies on the correct implementation of certificates or certs. We’ve all been taught to look for the “green lock” on websites, and things such as mixed mode and HSTS are a good push for that.



Certs 101:

Websites, say michaelhendrickx.com, have a leaf certificate. This certificate holds some metadata, including a key that’s used to decrypt traffic (I oversimplified it, TLS is explained in depth here). That leaf certificate needs to be trusted by your user-agent. Since you can’t trust all certificates in the world, you’d trust whoever issued it. That way, a certificate is signed by a trusted party (a CA); or by a party that’s trusted by a root CA. Typically, root CA’s are trusted by your computer, who allow intermediate CA’s to issue leaf certs.



sizeof(CertificateAuthorities)

User agents, and sometimes delegated to the operating system has several pre-installed certificates of these root CA’s. These may be larger number than initially anticipated. A typical windows installation has a few dozens if not hundreds of trusted root CA’s:

Now, each of these root CA’s can trust several intermediate certs, who can issue certs (or trust other CA’s, …). So, essentially you’re trusting hundreds, if not thousands, of entities. These are usually a combination of companies as well as governments.

Since these leafs create several paths to their roots, and since I’m a sucker for visualizing graphs; I decided to graph out the certificates of Alexa’s top 1m websites; but the script crashed after some 67.000 of it.

Note that this is a large dataset, which will slow down the visualization factor of it. You can download the JSON files from GitHub.



The Graph

You can play around with the data set at https://michaelhendrickx.com/certgraph/, (warning, graph rendering might be slow on large datasets) but eventually you’ll have something like the chart below.

distribution of top ~60k popular sites certificates.

Getting the data

The files come with a .NET core 3 client app which will scan a text file of hostnames, and store the certificate chains in a json blob. They’re essentially [rootca]->[intermediate]->[…]->leaf. I only opted for name, subject, expiry, serial and thumbprint, but you can get anything that the X509Certificate2 class gives you. For example., it might be cool to see what CA’s give the longest valid certs, or what keysizes are used, etc…

Technically, I created a custom RemoteCertificateValidationCallback, which is typically used to perform SSL validation, (such as checking CRL’s, as .NET doesn’t do that out of the box).

Again, please play around with https://michaelhendrickx.com/certgraph/, and let me know if you’d see something added to it.

Thanks,

Michael