It's that time of year again! I'm really excited to publish the 6th installment of my Alexa Top 1 Million analysis so we can take a look over our progress on securing the web over the last 6 months.

Previous Crawls

It's hard to believe there are now 5 previous crawls available for comparison purposes!

August 2015

February 2016

August 2016

February 2017

August 2017

As I publish more of these reports we start to get a much clearer picture of the progress we're making. If you're interested in doing your own analysis not only do I have the links above but I also publish the data from my crawlers on a daily basis. If you want to get hands on with a large set of data I'd love to see what further analysis you can do.

February 2018

The first report of 2018 and it's looking like a good one. As is tradition, let's start with a quick summary and get a look at what kind of things we have in store.

Similar to the Aug 2017 when we saw a huge jump in the number of sites using HPKP, we've seen a continued rise in the use of HPKP and a huge jump in the number of sites using HPKP-RO too. I used to a be a big supporter of HPKP, I even have guidance on how to set it up, but I recently gave up on HPKP and Chrome announced they may deprecate it. This does make it interesting to see continued and strong growth in its usage and it's also make a trend pretty clear; the larger sites are less likely to use HPKP. This is the reverse of the trend for every other metric.

One of the things I'm always eager to see in these reports is the adoption of HTTPS and whether we're still continuing to encrypt the web at an impressive rate. I'm really glad to say that we are continuing to make outstanding progress on that front!

The line does look a little less smooth in this scan, and checking the daily scans this does seem to have been a trend developing over the last few weeks, but either way, we have seen a 32.2% increase in the number of sites redirecting to and enforcing HTTPS in the Alexa Top 1 Million!

One thing I am sad to say though is that something I predicted back in 2017 and have talked about a few times on Twitter has come to pass. The rate at which we were migrating to HTTPS was not only being maintained but it was actually increasing in previous reports, you can see that in the graph. This, of course, could not be maintained forever. Whilst we are still seeing tremendous growth, and I'm massively excited about that and proud to be a part of it, the graph is starting to show signs of a plateau. From Aug 2017 to Feb 2018 the rate of progress has slowed. We're still going in the right direction, and no doubt will continue to do so, but the Aug 2018 and Feb 2019 reports may show much smaller steps forward.

Security Headers

We can't forget the original reason that this whole report started and the use of Security Headers was that reason. Powered by the scanning and analysis engine on securityheaders.io here are the usage of headers and the Security Headers grading in the Alexa Top 1 Million sites.

We're still seeing the same interesting trends that have been present in all previous scans and another one has emerged. Right down towards the bottom of the ranking there is a clear group of sites with a noticeably higher grading. Perhaps an opportunity for someone to grab the data and take a look why. It could be a large hosting provider or platform doing something new by default, or maybe just an anamoly. Let me know if you figure it out!

Let's Encrypt

It's now 2 years since I started tracking the use of Let's Encrypt certificates in these reports and I'm pretty sure that no one here needs me to tell them what's coming.

Let's Encrypt have continued to see strong growth in their presence in the top 1 million sites on the web. Removing cost and technical barriers really does help increase adoption and this is the proof. Back in Aug 2017 Let's Encrypt were close to becoming the largest issuing CA in the top 1 million sites and they did it by Oct 2017, just 2 months later.

Very soon @letsencrypt will be the largest issuing CA for the top 1 million sites on the web 😎 https://t.co/i9byJj2mR5 pic.twitter.com/mXXHWXjUe6 — Scott Helme (@Scott_Helme) August 30, 2017

They did it!!! 🎉🎉🎉

On the 20th Oct 2017 by 80,352 to 80,062 certificates, Let's Encrypt became the largest issuing CA in the Alexa Top 1 Million! 🍾 pic.twitter.com/Wa4IKsgtDc — Scott Helme (@Scott_Helme) December 3, 2017

EV Certificates

In the Aug 2017 scan I introduced a check for EV certificate usage in the Alexa Top 1 Million and I've left the logic in place to continue to monitor the usage of EV certs. I guess one important thing to point out here is that has been only one change in the methodology that allows me to identify more EV certificates than I did previously. Anyone that's tried to do something like this will tell you that identifying EV certs isn't exactly easy!

We're still seeing the same considerably higher adoption at the top end of the ranking but the really interesting thing here is that overall there's almost no growth in the use of EV certificates. In Aug 2017 I detected 17,877 sites using an EV certificate but I ran the new logic against my old data (I keep all scan data for historic scans) and identified a new total of 18,552 sites using EV certificates. In the new Feb 2018 scan that number has only increased to 19,803 EV certificates. Whilst HTTPS has seen an increase in adoption of 32.30% compared to the last scan, EV certificates only accounted for 6.74% of the increase.

Certificate Authority Authorisation

CAA is a brand new DNS record that sites can set to control which CAs they authorise to issues certificate for their domain. I have a great introduction blog on CAA if you want more information, but the good news is that it's now one extra metric that I'm tracking in the daily crawl! I did a brief intro post about CAA usage back in December when I first added the metric and this is the first time it will be included in a report.

As is common in these results now we're seeing comparatively huge adoption in the sites higher up the ranking with a quick decline followed by a much steadier decrease. I found a total of 4,064 sites with a valid CAA policy set compared to 3,404 in the first scan in Dec 2017, an increase of 19.39% in roughly 2 months. Let's hope that by the Aug 2018 scan we will continue to see a healthy increase in adoption.

General Stats

The raw crawler data is available but I also like to publish a selection of statistics from the data:

Total Rows: 946719 Security Headers Grades: A+ 763 A 15258 B 18954 C 26957 D 146633 E 29691 F 708385 R 78 Sites using strict-transport-security: 94116 Sites using content-security-policy: 24044 Sites using content-security-policy-report-only: 4595 Sites using x-webkit-csp: 455 Sites using x-content-security-policy: 1235 Sites using public-key-pins: 6889 Sites using public-key-pins-report-only: 2709 Sites using x-content-type-options: 132085 Sites using x-frame-options: 124835 Sites using x-xss-protection: 105956 Sites using x-download-options: 12021 Sites using x-permitted-cross-domain-policies: 11593 Sites using access-control-allow-origin: 32294 Sites using referrer-policy: 3990 Sites redirecting to HTTPS: 372125 Sites using Let's Encrypt certificate: 108146 Top 10 Server headers: Apache 221564 nginx 160874 cloudflare 92251 Microsoft-IIS/8.5 35599 nginx/1.12.2 29258 Microsoft-IIS/7.5 24947 LiteSpeed 23226 GSE 23041 openresty 14749 Apache/2 12885 Top 10 TLDs: .com 443948 .org 45933 .ru 40995 .net 38964 .de 38756 .br 27815 .uk 22215 .pl 17704 .it 14246 .ir 13841 Top 10 Certificate Issuers: C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3 108146 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA 46220 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO ECC Domain Validation Secure Server CA 2 38537 C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = http://certs.godaddy.com/repository/, CN = Go Daddy Secure Certificate Authority - G2 29436 C = US, O = GeoTrust Inc., CN = RapidSSL SHA256 CA 10741 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA 10662 C = US, O = Amazon, OU = Server CA 1B, CN = Amazon 9380 C = US, ST = TX, L = Houston, O = "cPanel, Inc.", CN = "cPanel, Inc. Certification Authority" 8489 C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 6580 C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA 6441 Top 10 Protocols: TLSv1.2 350451 TLSv1 7309 TLSv1.1 165 Top 10 Cipher Suites: ECDHE-RSA-AES256-GCM-SHA384 147985 ECDHE-RSA-AES128-GCM-SHA256 127964 ECDHE-ECDSA-AES128-GCM-SHA256 41043 ECDHE-RSA-AES256-SHA384 15400 DHE-RSA-AES256-GCM-SHA384 4326 ECDHE-RSA-AES256-SHA 3231 DHE-RSA-AES256-SHA 2484 0000 2194 AES256-SHA 2113 AES128-SHA 1855 Top 10 PFS Key Exchange Params: ECDH, P-256, 256 bits 325059 ECDH, P-384, 384 bits 6822 ECDH, P-521, 521 bits 6267 DH, 1024 bits 6208 DH, 2048 bits 1275 ECDH, B-571, 570 bits 103 ECDH, brainpoolP512r1, 512 bits 18 DH, 4096 bits 5 DH, 3072 bits 2 DH, 768 bits 1 Top Key Sizes: 2048 bit 289141 256 bit 41402 4096 bit 24527 1024 bit 315 3072 bit 231 384 bit 87 8192 bit 7 2432 bit 4 2049 bit 3 512 bit 2 Sites using CAA: 4186

Other Observations

Looking over the data myself there are some other interesting observations that can be made.

Public Keys

We've seen a huge jump in the number of 2,048 bit RSA keys as you'd expect from a jump in the adoption of HTTPS, but we're also seeing the use of 256 bit ECDSA key usage increasing too, up from 32,070 in Aug 2017 to 41,402 in Feb 2018. The majority of the increase in HTTPS was taken up by RSA though.

Not only that but the use of 3,072 bit and 4,096 RSA keys has also risen quite sharply. 3,072 bit went from 142 to 231 and 4,096 bit went from 16,942 to 24,527. Those are some pretty sizeable keys and there are a lot of sites using them, which does come as a little bit of a surprise.

Cipher Suites

Given the constant drive towards performance on the web, the public key usage above was fairly interseting and so too is the user of cipher suites. The top cipher suite remains as ECDHE-RSA-AES256-GCM-SHA384 raising from 113,309 sites in Aug 2017 to 147,985 sites in Feb 2018. I would have expected that ECDHE-RSA-AES128-GCM-SHA256 would be the most popular suite but that ranked second in both scans with 79,256 sites in Aug 2017 and 127,964 in Feb 2018.

From the graph I guess we can say that the very top sites in the ranking have the highest amount of support for ECDHE-RSA-AES128-GCM-SHA256 which is the faster of the two RSA suites.

Protocol Support

With the pending removal of TLSv1.0 support in PCI DSS coming in June, protocol support will be another interesting thing to keep an eye on. GitHub also did an expirement recently where they disabled TLSv1.0 and TLSv1.1 support on github.com and other services to see what would break. The good news is that protocol support does look pretty good.

To put that another way.

Protocol support looks pretty good in the top 1 million. We have the vast majority on TLSv1.2, a tiny slice on TLSv1.0 and an even tinier slice on TLSv1.1 after that. Once sites do remove TLSv1.0 they may as well remove TLSv1.1 at the same time and just have TLSv1.2 unless TLSv1.3 is here by then.

Servers

The top 4 servers in use hasn't changed and in order are still Apache , nginx , cloudflare and Microsoft-IIS/8.5 . Cloudflare have changed their header from cloudflare-nginx to cloudflare and also saw a small loss in the number of sites returning their header but remain 3rd in the ranking. As the 3rd most popular server on the planet I'd imagine removing those 6 bytes from the Server header has actually added up to a fairly significant amount of data of the last few weeks/months!

Report URI

Another cool thing that I wanted to look at was how many sites are using Report URI in the Alexa Top 1 Million.

As of right now that graph is showing 413 sites which is somewhat short of the real total for two main reasons. One, some of the larger sites that report with us downsample their reports by only injecting the report-uri directive into a subset of responses and two, not all sites configure reporting via the HTTP response header. It is also possible to enable reporting using Report URI JS and my crawler doesn't analayse the body of the page so it'd miss those too. As with all of the other trends we have a much larger presence in the higher ranked sites and a steady trend once you get out of the top few thousand.

Raw Data

As always, details on how to get hold of the raw data can be found here and I'd love to see any further analysis that other members of the community could contribute!

Details on raw data here.

Raw data download links here.

Google sheet with tables and graphs here.