There are countless lists on the internet claiming to be the list of must-read programming books and it seemed that all those lists always recommended that same books minus two or three odd choices.

Finding good ressources for learning programming is always tricky. Every-one has its own opinion about what book is the best to learn, and as we say in french, “Color and tastes should not be argued about”.

However I though it would be interesting to trust the wisdom of the crown and to find the books that appeared the most in those “Best Programming Book” lists.

If you want to jump right on the results go take a look below at the full results. If you want to learn about the methodology, bear with me.

I’ve simply asked Google for a few queries like “Best Programming Books” and its variations of. I have then scrapped all those pages (using ScrapingBee, a web scraping API I’m working on).

I’ve deduplicated the links and ended up with nearly 150 links. Using the title of the pages I was also able to quickly discards:

list focused on one particular technology or platform

list focused on one particular year

list focused on free books

Quora and Reddit threads

I ended up with almost 110 HTML files. I went on opening all the files on my browser, open my chrome inspector, found and wrote the CSS selector matching book titles in the article. This took me around 1hours, almost 30 seconds per page.

This also allowed me to discard even more non-relevant pages, and I discarded a lot. In the end I compiled around 70 lists into this one.

At this moment I had this big JSON file referencing the HTML page previously scrapped, and a CSS selector.

Using Python with Beautiful soup, I’ve extracted every text inside DOM elements that matched the CSS selector. I ended up with a huge list of books, not usable without some post-processing.

To find the most quoted programming books I needed to normalize my results.

I had to play with all the different variation like “{title} by {author}” or “{title} - {author}”.

Or “{title}:{subtitle}” and “{title}”, or even all the one containing edition number.

And with quite a bit of manual cleaning.

My list now looked like this:

From there it was easy to compute the most recommended books. You can find all the data used to process this list on this repo. Now let’s take a look at the list:

‍