There are countless lists on the internet claiming to be the list of must-read python books and it seemed that all those lists always recommended that same books minus two or three odd choices.

Finding good ressources for learning programming is always tricky. Every-one has its own opinion about what book is the best to learn, and as we say in french, “Color and tastes should not be argued about”.

However I though it would be interesting to trust the wisdom of the crown and to find the books that appeared the most in those “Best Python Book” lists.

If you want to jump right on the results go take a look below at the full results. If you want to learn about the methodology, bear with me.

I’ve simply asked Google for a few queries like “Best Python Books” and its variations of. I have then scrapped all those pages (using ScrapingBee, a web scraping API I’m working on).

I’ve deduplicated the links and ended up with nearly 170 links. Using the title of the pages I was also able to quickly discards:

list focused on one particular technology or platform

list focused on one particular year

list focused on free books

Quora and Reddit threads

I ended up with almost 130 HTML files. I went on opening all the files on my browser, open my chrome inspector, found and wrote the CSS selector matching book titles in the article. This took me around 1hours, almost 30 seconds per page.

This also allowed me to discard even more nonrelevant pages, and I discarded a lot. In the end I compiled around 70 lists into this one.

Book titles were then extracted with manuel extraction and some web scraping.

I ended up with a huge list of books, not usable without some post-processing.

To find the most quoted python books I needed to normalize my results.

I had to play with all the different variation like “{title} by {author}” or “{title} - {author}”.

Or “{title}:{subtitle}” and “{title}”, or even all the one containing edition number.

And afterquite a bit of manual cleaning.

My list now looked like this:

From there it was easy to compute the most recommended books. You can find all the data used to process this list on this repo. Now let’s take a look at the list:

‍