Here I am, blabbing on about Ruby’s Mechanize gem again. I decided to compile a short list of the most common problems I’ve seen in my long (not that long), storied career as a renegade data miner.

There’s a serious lack of good programming stock photos in the world.

HTTPS/SSL Errors

This error commonly presents itself with something like:

SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)

Which is incredibly ugly/scary for anyone new to Mechanize. But, there’s a simple fix:

I didn’t say it was secure.

Ok, ok.. the truth is you don’t want to use this in production code that depends on secure communication, but if you just wanna get all that juicy data in a pinch — try it out.

User Agent Blocked

This one is harder to diagnose, but if you find yourself getting empty response bodies or anomalous/weird errors back from the server, do yourself a favor and punch this in:

Literally chooses a random UA. Roll the dice.

You can, of course, choose your own UA as well. This is just more fun.

Following Redirects

This one is weird as well. Sometimes you get a 404 error, and sometimes just a chunk of HTML saying something like This page has moved to.. Another obvious indicator is a 301 response code in the headers. I wrote a previous article talking about using Burp Suite for web scraping, which you should totally check out if you wanna really analyze that browser traffic.

Oh yeah, the fix(es):

I’m just here for the ride, Mr. Webserver.

Oh no! My Request Timed Out..

Self explanatory, probably, but this is where Mechanize tries to request a resource but doesn’t get a response within the allotted time. So, we just need to increase that allotted time:

Easy peasy, my-wife-won’t-let-me-squeezy

And that’s pretty much it for that one.

Are we speaking the same language?

This one is important, and I guarantee you will run into this issue over and over again. You’re speaking JSON, but the server only wants URL Encoded crap. If you’re sending a request that you just know should be understood, but the server keeps crying or denying your request — try a new content type!

And yes, there are many types to try, but I’ve only seen a few in the wild.

Conclusion

The article is over, that’s the conclusion. I’m sure there are many more errors I could list here, but these are the ones that hurt me so often I memorized them.

Happy coding and whatnot.