When hyperlinks go dead by returning 404 or 500 HTTP status codes or redirect to spam websites, that is the awful phenomenon know as “link rot”. Link rot is a widespread problem; in fact, research shows that an average link lasts four years.

In this blog post, we will look at how link rot affects user experience, using Full Stack Python as our example. We’ll build a Python script that detects link rot in Markdown and HTML files so we can quickly find and fix our issues.

A Link Rot Example

fullstackpython.com is a website created by Twilio employee Matt Makai in 2012. The site has helped many folks, including me, learn how to best use Python and the tools within its ecosystem.

The site now has over 145,000 words and 150 pages, including:

2400+ links in the repository

300+ HTML files

150+ Markdown files

And there’s expected to be more links and files in the future. With 2400+ links on the site, it is really difficult to immediately spot dead links. Users could report these via issues or pull requests at best, or at worst, users may not know what to do and leave the site. On the maintainer’s side, checking all the URLs by hand is not a sustainable solution. Assuming that link checking takes 10 seconds each, it would take at least 24000 seconds (or 6.7 hours) to go through all the links in one sitting.

There must be an automated solution to handle all of the link rot madness!

Python to the Rescue

Our approach will be to aggregate all the links from the site and check each URL using a Python script. Since the site content is all accessible on GitHub as a repository, we can clone the repository and run our script from the base folder.

The first step is to clone the repository: