FreshDirect/FoodKick do send their own sale notification emails, but we only cared if “special” items were on sale…margarita mix.

Everything else was just noise.

The Goal

Nicole and I agreed that the bot would (1) scrape the two websites every morning at 9 am, (2) search for sales of “specific” items (margarita mix), and (3) send the items from our watch list that are on sale to her via text.

Her goal was to add this info into her regular online grocery shopping workflow.

I built the bot prototype fairly quickly on my local machine, but realized it would only be effective if it was remotely hosted.

The bot also needed to run in the background, or I would need a logged in terminal client to keep the application running.

Hosting on Heroku

Getting it to run on Heroku was quite a feat and involved a lot more technical gymnastics vs. getting it to run locally.

I created a Procfile in the root of my project telling Heroku what to do when it launches my app.

In this case it's specifying a background worker instead of web .

worker: node index.js

To tell Heroku we only want a background worker and no web from commandline:

heroku ps:scale web=0 worker=1

I began coding the bot using the latest version of node (8), and taking full advantage of the arrow functions and ASYNC/AWAIT .

Reading this guide, I discovered that Selenium Webdriver was only compatible with Node 6.10.3 on Heroku and I’d need additional setup to get it to work.

Node 6.10.3 + Babel

Now that I was running Node 6.10.3, none of my fancy ES6/7 Javascript would run.

First I installed babel-cli . This would let me use the babel-node command to transpile my new fangled Javascript into the ancient variety that works in Node 6.

npm install --save babel-cli

Then I added the babel-preset-latest package, which adds transpiling support for every Javascript method that has been added.

npm install --save babel-preset-latest

Last year you would have had to install 10 different babel packages to transpile all the additions to Javascript.

Then finally I added the preset to my new .babelrc file.

{ "presets": ["latest"] }

Locally, my app ran properly, but when I deployed to Heroku , I experienced a number of new issues.

Heroku + Babel

I updated my Heroku Procfile to make use of babel .

worker: babel-node index.js --presets latest

Since Heroku defaults to production mode, none of my babel dev dependencies loaded.

I reinstalled each babel dev dependency as a project dependency and finally Heroku could run my app.

Alternatively, this command disables production mode so that development dependencies will install too.

heroku config:set NPM_CONFIG_PRODUCTION=false

Here’s my final package.json :

In the engines key, I specified the version of Node I want to use so Heroku finds 6.10.3 instead of installing the latest version.

PhantomJS

I then discovered that Heroku had trouble with Chromedriver and came across some threads talking about using PhantomJS instead.

In order to use PhantomJS with Heroku I’d need to install the PhantomJS Heroku build pack.

heroku buildpacks:set https://github.com/heroku/heroku-buildpack-nodejs.git heroku buildpacks:add --index 1 https://github.com/stomita/heroku-buildpack-phantomjs.git

Finally, add PhantomJS to the path :

heroku config:set PATH=$PATH:vendor/phantomjs/bin

I was still having problems with Heroku finding the path to PhantomJS, but was able to resolve this after I installed the pre-built version of PhantomJS.

npm install --save phantomjs-prebuilt

All was well and my script was working properly, but I noticed that my function that checks for an “On Sale” menu option by class name was now failing with PhantomJS where it worked fine with Chromedriver .

I’m still not sure why there was a discrepancy between the two, but I ended up getting it to work by rewriting the function using a selector that PhantomJS could find.

To run the bot on a daily timer I installed node-schedule :

npm install --save node-schedule

and imported it into my project:

var schedule = require("node-schedule");

Building the Browser Bot with Selenium Webdriver

The app begins by using node-schedule to trigger the scrape sequence every day at 9AM on the dot:

The cron format for setting the timer consists of:

After defining the schedule, I begin the web scraping sequence by iterating through each item in my watched items array.

In order to create and control the PhantomJS browser instance I needed to install Selenium Webdriver .

npm install --save selenium-webdriver

Then I imported Selenium Webdriver and it’s helper methods By and until into my project:

const webdriver = require("selenium-webdriver"),

By = webdriver.By,

until = webdriver.until;

I call the geOnSaleItemsFK function for each item title.

A new PhantomJS headless browser instance is created and a search url is generated using the query string that should only bring back sale items.

The browser opens the generated URL.

At this URL, Selenium Webdriver looks for the class name product--wrapper as that is the container class that holds all the text for this product.

It finds the text associated with each product and adds the string to the productDetailsFK array.

When an item exists, but isn’t on sale, that item will still load in the carousel, providing a false positive.

Not on Sale

I tried to figure out the difference between real sale screens and the non-sale ones.

On Sale

They seemed to differ in that the Show Me Only filter group didn't exist when an item was listed, but not on sale.

Here’s where PhantomJS couldn't find the className selector Chromedriver found, but was able to find the name selector.

I simply had it check to see how many child elements it had.

If there were none, then the item wasn’t on sale and shouldn’t be sent via text to my wife.

Finally, if the item is a sale, use Twilio Programmable SMS to send the sale products to my wife via SMS:

npm install --save twilio

To setup Twilio I made an account in order to get an accountSid and authToken. I also purchased credits for sending SMS messages.

To actually send the SMS you need to create a new Twilio message.

When the SMS completes sending, it triggers a callback that will log the sid of the sent message.

After the Heroku server finishes building and deploying it starts the app via the command in the Procfile.

To see what the background worker is doing and view the console logs:

heroku logs

Here’s what the logs look like after the script runs at 9am:

Here’s what the texts look like on my phone:

SMSs with FoodKick + FreshDirect Sale Items

Resources

Conclusion

My wife was pretty happy, and I get to eat more discounted blackberries and scallops.

This project is an awesome template for future bots as it includes many critical elements that most bots need.

Critical Elements

Scrape data from a website using a webdriver and headless browser Host on an external server and run the script in the background Configure the script to run on a timer, scheduling the bot to work Use an API to programmatically send SMS messages

Let me know in the comments if you found this guide helpful!