Search engine crawlers — Author: Seobility — License: CC BY-SA 4.0

Have you ever been in a situation where you’re almost all of the way home in your client-side rendered javascript web application and you realized that almost all web crawler bots are not compatible with javascript? Well, I know I have!

In order to follow along with my examples, I recommend that you have basic knowledge of javaScript and client-server architecture.

When the issue arrived it was almost too late to convert the whole application to a server-side rendered app. So like a normal developer I Googled. I wasn’t surprised to see many of the other fellow developers suggesting that I should have started with server-side rendering in the first place but, my issue was that the client has suddenly made us aware that the public pages at least should be Search Engine Optimized. In my case, I was stuck with keeping the existing code base as is since it was due for a release in a months time.

A service called prerender.io took my attention on one of the online forums I was reading on. It was a brilliant service that served web crawlers static HTML content so that they will have no trouble reading the content. All we had to do was to check whether the request to our server is coming from a bot. If so, we had to proxy the request to Prerender’s own server. But there was one problem, the above-mentioned service costs a considerable amount of money to operate. So, I needed to find a cheaper way of doing it. I thought to myself why not try to replicate what Prerender is doing, in one of my own servers?

I looked for ways that I can create a simple server and a good headless browser that I can use. After a bit of research, I decided what technologies I was going to use.

ExpressJS will serve our web renderer.

will serve our web renderer. Puppeteer will be our headless browser.

ExpressJS will help us build our web server which will serve our incoming requests and Puppeteer will help us render our Javascript application and send plain HTML content in the response.

Let’s dive straight to our server code!

The code above basically creates an express server and listens to all incoming[GET] requests. When the server is first initializing, it will launch an instance of our Headless Browser. We avoid launching a browser instance every time a request comes in because it will fill up the memory resources on our server.

For every incoming network request, we are creating a new Page in the browser and asking it to go to the URL that we derive by parsing the request URL. Here, we are asking for the page to wait until all network requests are completed because inside our javascript application there might be pages where you have to wait for a network request to complete in order to populate additional data.

After the page is done evaluating we return the raw HTML content in plain text and send it back as the response. Remember to clean up the Page object that you created and make sure it will be closed after each request. Clone my GitHub repo and try out yourself! Try if you can improve the performance of our server by implementing a Redis cache for our rendered HTML.

Now that our server is up and running ready to serve our requests, we need to filter out only the crawler bots and when they request our website we need to proxy that to the server we just created. In my case, I was serving my website with NGINX. Unfortunately, I can only help you with how to set up your NGINX config to proxy the bot requests to our server. Your default NGINX config file is usually located here `/etc/nginx/sites-available/default`.

After changing your config you will need to restart your NGINX instance. If all went well any of the bots mentioned in the user-agents section in the config will be redirected to our Express Server above. A suggested way to test this is to use Postman. You can modify the user-agent in the request and try sending a request through it.

This might not be as fast as doing server-side rendering but, it definitely was a feasible option in my experience. My production website has been running for over 4 months now and I never had issues with the performance of my rendering server. The average response time for a bot request is around 5 seconds but this definitely can be improved by introducing caching on our server.

If my story helped you in any way please give me some 👏👏👏. If you think this could be improved, please share it here so everyone can see.