After the previous article on getting my React App rendered on the server side with Rendertron and Kubernetes, I needed a way of letting those Search Engines know about the content (namely updated interpretation of the Ethereum blocks).

As the content on http://inkl.in is updated constantly whenever a new block appears on the chain I needed to make sure crawlers could find things a bit easier than the organic internal linking that takes place.

Enter sitemaps.

A sitemap is an XML file that will hint to a Search Engine that there is content it should look at. I say hint, as it does not force the likes of Google or Bing to index the content, but it’s better than nothing.

Since pushing the inkl.in sitemap to Google I can see about 5,000 pages being indexed per day, and there is organic search traffic hitting the site.

The problem with having as much content as there is on the site (over 6 million blocks = over 6 million pages, plus visualisation of addresses, which is well over 300 million) is that it will not fit into a single sitemap XML file.

To get around this, enter sitemap indexes. An index is another XML file used to reference all of the other sitemaps. There are limits in how much data you can have in a single sitemap file, namely 50,000 pages, and maximum size of 50Mb.

So two things need to happen, I need to generate a sitemap index that links to sitemaps for all Blocks in inklin. I’m not going to be too intelligent about this as I don’t mind a few thousand 404 errors for missing blocks.

The inklin API has a “live” endpoint which will give me back the data about the latest block (including the block number)

Once I have this, all I need to do is decrement from there down to Block 1.

This will allow be to generate a sitemap entry per block which Google or Bing can decide to crawl. I also need to make sure that we generate a new sitemap file for every 50,000 entries.

And then finally once I have a directory of sitemaps I need to generate the index.

Once we have the full code, I need to construct a Kubernetes Job (a one time Pod) to run and generate this on a daily basis.

You’ll notice I have a volume defined. I’m using Azure Files (CIFS fileshare) so that the Sitemaps Job can generate and save the data. This is important, because your sitemaps need to be hosted and server from the same domain as you site.

I’ve modified the Frontend Pod spec to mount the same CIFS share under /sitemaps in the NGINX root. I know, I know, I could have also setup an Ingress controller to separate this, but I hacked this together in an hour!

My Helm template now looks like this:

Once that’s done, whenever the Sitemaps Job runs, the data will be freshened for whenever Google or Bing decides to re-trawl the Sitemap Index. And here’s the final result…