Analyzing HN Readers' Personal Blogs Part 1

On April 7th 2020, An ASK HN post bubbled up to the front page of HN titled “Ask HN: What is your blog and why should I read it?” https://news.ycombinator.com/item?id=22800136

It was great reading through all of the comments of HN readers posting about their blogs/personal sites, why the write, and what they write about.

It was really inspiring to read and relate to everyone. It also inspired me to analyze everyone’s personal website. What technologies are HN people using? What does a typical HN personal site look like?

Data Collection

The initial step was straightforward enough. I though about using the HN API, but just ended up copy and pasting all of the text from the entire post and then using regex on the command line to spit out a list of URLS.

I then did some manual data cleansing. Removing HN links, twitter links, duplicates etc until I only had unique top level domains of personal blogs.

547 - Initial raw URLS -> 382 - Unique blog URLS

Initial Analysis: Wappalyzer

For this initial analysis I used the open sourced NPM module of Wappalyzer. They do have a paid version on their website https://www.wappalyzer.com/ if you do not want to deal with the CLI. [No relationship]

I made a bash script that went through my list of 382 unique URLS and then saving the outputs to individual JSON files.

From this process I returned 370/382 with status code 200 and data from Wappalyzer. The other URLS were discarded. Too lazy to redo them or manually check.

Data Wrangling

I then pulled up a trusty instance of Jupyter notebooks using pipenv and pulled all the data into a single glorious dataframe with 2315 rows.

Disclaimer

I ran this data very quickly This is my first time using Wappalyzer in this way I’m not sure how accurate or what the limitations are with the Wappalyzer open source tool Always take everything with a few grains of salt…life tastes better that way

And Last, but not least…..the “Insights”

Analytics Usage (Google is listening…)

224/370 = 61% Use some form of analytics or tracking software on their blogs 174/370 = 47% Use Google Analytics on their blogs

Analytics tool Count Google Analytics 174 Parse.ly 13 TrackJs 7 Optimizely 6 Clicky 4 Matomo 4 New Relic 3 Statcounter 3 BugSnag 2 Segment 2 Simple Analytics 2 WP-Statistics 2 Gauges 1 Intercom 1 Grand Total 224

Advertising (It’s not a hobby dammit!)

Only a small fraction of the sites have some detectable advertising framework 27/370 = 7% - with the majority being Google AdSense

Advertising Count Google AdSense 17 Carbon Ads 5 BuySellAds 4 DoubleClick Ad Exchange (AdX) 1 Grand Total 27

Content Management Systems (WP is still dominant)

121/370 = 33% use a content management system that was detected by Wappalyzer. I’m sure more people are using CMS on the backend to manage posts locally with static sites as well. 76/370 = 21% use WordPress

CMS Count WordPress 76 Ghost 17 Medium 13 Blogger 10 Wix 1 Joomla 1 Squarespace 1 Svbtle 1 Tumblr 1 Grand Total 121

Web servers (Mostly Apache and Nginx)

91/370 = 25% Use Nginx as a reverse proxy for their site OpenGSE (Google Open Source Blog) is the most popular “non-traditional” web server with 10/370

Web Servers Count Nginx 91 Apache 40 OpenGSE 10 Cowboy 7 LiteSpeed 5 OpenResty 5 Caddy 4 Now 2 lighttpd 1 Phusion Passenger 1 Grand Total 166

Platform as a Service (Github Pages and Netlify)

51/370 = 14% of blogs use Netlify for hosting and CDN. [I do too :)] 59/370 = 16% of blogs use Github Pages

PaaS Count GitHub Pages 59 Netlify 51 Automattic 19 Amazon Web Services 18 SiteGround 5 Flywheel 2 WP Engine 1 Grand Total 155

Programming Languages

Programming Languages Count PHP 83 Ruby 67 Node.js 45 Python 13 Java 11 Erlang 7 Lua 5 Go 4 Perl 1 Grand Total 236

UI Frameworks

47/370 = 13% use Bootstrap

UI Frameworks Count Bootstrap 47 animate.css 5 ZURB Foundation 3 Pure CSS 1 Material Design Lite 1 Bulma 1 Grand Total 58

Static Site Generators

I personally use and love Jekyll. Great that 82/370 = 22% of sites are using some form of static site generation. HTML > everything.

Static Site Generators Count Hugo 41 Jekyll 24 Gatsby 12 Hexo 3 Pelican 1 VuePress 1 Grand Total 82

Raw Data

Link to raw CSV file for your own enjoyment.

Download here

What else would be fun?

In future parts of this series it would be fun to look at

The speed and performance of loadings these websites. The average “size” or “weight” of these websites Number of posts on the blog Type of content The frequency of posts Do they have RSS!?? Automate the data collection using the HN API

Discussion on Hacker News

https://news.ycombinator.com/item?id=22822401

Hope you enjoyed this, Danny

Thank you @stevekemp and @stared for the edits and suggestions