Jordan Makes His HTTP Requests More Consistent

The goal with this post is to get some information on why a lot of my requests in the link checker were returning inaccurate status codes. When you have a link checker that is trying to determine whether the links it is checking are valid, it’s pretty important to have a pretty high accuracy rate.

The library that I have been using for pretty much all of my http web scraping has been request. I have gotten pretty comfortable with it. Having this unreliability forced me to look elsewhere to see if I would get similar results there. The other one that I had heard a lot about was axios and so that is what I tried first.

axios vs request-promise… FIGHT

// const url = 'https://www.sugarandcharm.com'; // axios 404, rp 404 // const url = `http://bethmichelle.com`; // axios 200, rp 403 // const url = 'https://www.courtneyssweets.com/'; // axios 200, rp 403 // const url = 'https://www.closetcooking.com/'; axios 200, rp 403 // const url = 'https://www.confessionsofaconfectionista.com/'; axios 200, rp 200 // const url = 'http://nutsforcooking.com/'; // axios 200, rp 200 // const url = 'http://javascriptwebscrapingguy.com/'; // axios 200, rp 200 // const url = 'http://citadelpackaging.com/'; // axios 200, rp 200 // const url = 'http://reddit.com/'; // axios 200, rp 200 // const url = 'https://amazon.com/'; // axios 200, rp 200 // const url = 'https://audible.com/'; // axios 200, rp 200 // const url = 'https://google.com/'; // axios 200, rp 200 // const url = 'https://reddit.com/r/funny'; // axios 200, rp 200 // const url = 'https://facebook.com/'; // axios 200, rp 200

Above I have listed the status codes returned for requests made to various websites. Each was tried at least five times with each library, often more than that. The surprise for me was the results from the first four requests. Smallish (compared to giants like reddit, amazon, audible, google, and javascriptwebscrapingguy.com 😉 ) websites were returning 403 to requests made by the request library whereas axios would return 200s.

Axios wasn’t perfect with the actual site (as you can see with https://www.sugarandcharm.com returning a 404 with both libraries) but it was a significant improvement. I really thought under the covers they would both be using the http power of nodejs and were really more like middleware. Something obviously is different between the two but I’m not exactly sure what.

Client side vs Server side http requests

After my tests showed that the requests were a lot more consistent with axios I went to use it in the electron app that I had built to give a face to the link checker. Immediately it failed with a CORS error. A CORS error is something that browsers have in place to protect users from being tricked into calling malicious domains. If I try to call a domain that isn’t where the request originated from and the receiving domain isn’t expecting it, the browser will prevent the request from happening.

It’s important to realize that this only happens with client side requests. The browsers are the ones that enforce this. When using the request library, I never had a problem with this because it was calling to the nodejs script which was then executing the requests with the a server side request and not leveraging the browser.

Axios apparently defaults to using the browser if it’s available where request does not. Since I knew what was happening, I just disabled the web security in the electron app and then the requests starting calling without a problem.

win = new BrowserWindow({ x: 0, y: 0, width: size.width, height: size.height, webPreferences: { nodeIntegration: true, webSecurity: false // This is what I added to the electron set up }, });

Maybe the difference between these two is also what is allowing axios to make successful calls when request is not. Whatever the reason, axios has proved more reliable and it is what I will use in the future.

Looking for business leads?

Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!