Simply put

The goal of this project is to build a web scraper that will run and perform searches on flight prices with flexible dates (up to 3 days before and after the dates you select first), for a particular destination. It saves an excel with the results and sends an email with the quick stats. Obviously, the objective is to help us find the best deals!

If you get lost in some part, try to have a look at my article about the Instagram bot, as it uses Selenium too.

The real life application for this is up to you. I’ve used it to search both holidays and recently also some short trips to my hometown!

If you’re serious about it, you can run the script on a server (a simple Raspberry Pi will do), and make it start once or twice each day. The results will be mailed to you, and I suggest saving the excel file to a Dropbox folder, so you can access it from anywhere, anytime.

I did not find any error fare yet, but I suppose it’s possible!

It searches through “flexible dates” so it will look for flights up to 3 days before and after the dates you select first. Although the script works for one pair of destinations at a time, you can easily adapt it to run several inside each loop. You might even end up finding some error fares… which would be awesome!

Yet another scraper

When I first started to do some web scraping I was not particularly interested in the topic. There… I said it! I wanted to do more projects with predictive modeling, financial analysis and maybe some sentiment analysis, but it turns out that it was a lot of fun figuring out how to build the first web crawler. As I keep learning, I realized web scraping is what makes the internet “work”.

Yep… Just like Larry and Sergey, you can hit the jacuzzi after you initiate the scraper! (image: wired.com)

You might think it’s a really bold claim, but what if I told you that Google started out with a web scraper Larry Page built with Java and Python? It crawled, and still does, the whole internet trying to provide you the best possible answer for your questions. There are endless applications for web scraping, and even if you prefer other subjects in Data Science, you’ll still need some scraping skills to get your data.

Some of the techniques I use here come from an awesome book I recently bought that covers everything related with web scraping. Plenty of simple examples and lots of practical applications. There’s even a very interesting chapter about solving reCaptcha checks which blew my mind — I was not aware of the existing tools and even services to deal with it! (Disclaimer: if you purchase the book through my link, I receive a small fee at no extra cost to you. So if you feel like buying me a coffee by the end of this article, I appreciate it!)

“Do you like traveling?!”

This simple and innocuous question often meets a positive answer and a subsequent story or two about a previous adventure. Most of us would agree that traveling is a great way to experience new cultures and broaden our own perspectives. But if the question was “Do you like the process of searching for plane tickets?”, I’m sure the reaction would be a lot less enthusiastic…

Python to the rescue.

The first challenge was to choose which platform to scrape the information from. It was not easy, but I settled with Kayak. I tried Momondo, Skyscanner, Expedia and a few more, but the reCaptchas on those websites were ruthless. After a few attempts selecting traffic lights, crosswalks and bicycles in those “are you human” checks, I concluded Kayak was my best alternative even though it throws out a security check if you load too many pages in a short period of time. I managed to keep the bot querying the website in 4 to 6 hour intervals and it was ok. There may be an occasional hiccup here and there, but if you start getting reCaptcha checks, either solve them manually and start the bot after that, or wait a few hours and it should reset. Feel free to adapt the code to another platform, and you’re welcome to share it in the comments section!

If you’re new to web scraping or if you don’t know why some websites go a long way to prevent it, please do yourself a big favor before writing your first line of code towards a scraper. Google “web scraping etiquette”. Your endeavour might be over much sooner than you think if you just start scraping like a madman.

Fasten your seatbelts…

Pun intended

After importing and opening a chrome tab, we’ll define some functions that will be used inside a loop. The idea of the structure is more or less like this: