Notification type products have slowly started to rise in popularity as we get closer to the “Internet of Everything”. By the year 2025 we’ll get cars that drive you to QFC because they noticed you’re running low on milk and regularly bought it for the last 30 weeks. The correlation of you getting your milk on time leads to you performing better at work and boosting your career growth by 25% over 10 years, or so your refrigerator will report to you every week so.

But before that becomes a reality, I’ve got to adjust and write my programs by hand to give me alerts. Companies like Dataminr and IFTTT have capitalized on the notification market but in different industries. (Dataminr mainly for alerting wallstreet people, IFTTT for the common-folk). But customized scraping really has some awesome benefits to it, and this is where I think Kimono has some upside.

The NBA playoffs are coming up and my Blazers are playing the Grizzlies . Albeit if Damian Lillard doesn’t become an absolute killer and go off, my interest will probably stagnate after a first round exit when the Blazers have been 0 for 4 against them this year. That being said, I know there’s a lot of people that are interested in watching all NBA games regardless of who they’re rooting for and especially when the game is getting close. Therefore, I decided this would be a cool demonstration of a scraping mechanism by creating a notification system for when games should be close enough to watch.

NBA real time scores already have an API but I thought I would make one using Kimono’s really easy scraping chrome extension anyway. (Note: I don’t work for them, they just have a cool product). First we have to grab the values we think are important on the ESPN NBA home page. That would be the home team name, away team name, scores, and the score time. Full github code is here.

Afterwards once the API is created, all we need to do is call the API to receive a json format of the results. Since we can set the API to scrape the site every fifteen minutes, we probably get a couple hits when it’s in the fourth quarter of a game. If that’s not frequent enough, create two more APIs and run them at intervals. But NBA games usually have a lot of time-outs, especially when the games are close, so we can probably live with an update every 15 minutes.

def wiregame (games): wire = [] for game in games: gt = game[ 'time' ][ 'text' ] #Find out if the game is actually playing if ':' in gt and 'PT' not in gt: time, quarter = str (gt).split( '-' ) home = int (game[ 'home_score' ][ 'text' ]) away = int (game[ 'away_score' ][ 'text' ]) #Check if it's the 4th quarter, under 6 minutes to go, and socre differential under 7 points. if quarter.strip() == '4th' and int (time[ 0 ]) < 6 and abs (home-away) < 7 : wire.append(gt + ": " + game[ 'home_team' ][ 'text' ] + " " + \ str (home) + " - " + game[ 'away_team' ][ 'text' ] + " " + str (away) + ". GAME IS COMING DOWN TO THE WIRE." ) return wire

The reason I have the values and then another nested dictionary [‘text’] in there is because each value also has a link. The team’s and score’s text on the ESPN page all have links to different pages so it creates a nested dictionary of a ‘href’ value and a ‘text’ value in case you want both.

Awesome, now that we have this, we can set up a twitter account to tweet the games when they’re getting real close! I use Tweepy to authenticate and wrap the twitter account.

import json import urllib import tweepy #Authenticate the login def login (): CONSUMER_KEY = 'YOUR_CONSUMER_KEY' CONSUMER_SECRET = 'YOUR_CONSUMER_SECRET' oauth_token = 'YOUR_OAUTH_TOKEN' oauth_token_secret = 'YOUR_OAUTH_TOKEN_SECRET' auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET) auth.set_access_token(oauth_token, oauth_token_secret) return auth if __name__== '__main__' : url = "https://www.kimonolabs.com/api/5c8gx236?apikey=[YOUR API KEY]" games = callApi(url) msgs = wiregame(games) api = tweepy.API(login()) #If there's a game going on, update the status if len (msgs) > 0 : print msgs for msg in msgs: api.update_status(status=msg) else : print 'NO GAMES'

You can follow the tweets right here .

So it’s a pretty cool application of making a scraper into an API. I actually have mainly used Kimono to scrape for one time use and save them as CSVs. But I think there’s a lot of different use cases for it. Kimono’s a little buggy sometimes, but overall I think it’s pretty good. Some websites that have scraping restrictions don’t work with Kimono either as well as websites with lots of javascript on the client-side. Currently there seems to be application errors when associated with craigslist (Possible legal restriction?). Anyhow, that doesn’t exactly help me with my future apartment search though.

IFTTT has a couple of nifty applications to automate some cross-platform devices. In it’s recipe book they have hundreds of different apps that you can post, update, tweet, etc. when another app posts, updates, or tweets. They have one for finding 2br apartments in SF but it might be nicer to customize the requirements a bit more. I don’t think I want to live in a hacker house no matter how cheap it is and definitely want to see some pictures on the postings.

I previously wrote a scraper with scrapy to parse through craigslist postings and save them into a database here. This time I’m not too interested in modifying my original code too much though and want to try out some more BeautifulSoup instead. I grabbed some of the code from Daniel Forsyth’s blog on finding tickets on craigslist. What I want to specify is any kind of place that will guarantee me under 1800 per bedroom in the SOMA or Potrero Hill district. I don’t care about the number of rooms as long as it’s within reason and the postings have to have pictures and a latitude and longitudinal location.

def apartments (num, soup, seen): results = [] #Find the list of craigslist postings for listing in soup.find_all( 'p' ,{ 'class' : 'row' }): #Check to see if they have pictures and coordinate locations as well as if they've been sent out before if listing.find( 'span' ,{ 'class' : 'p' }).text == ' pic map' and listing.get( "data-pid" ) not in seen: text = listing.text.split( " $" )[ 1 ] price, beds = text.split( " " )[ 0 ], text.split( " " )[ 2 ][ 0 ] #Divide to check if it's within the proper range on a per bedroom basis if float (price)/ float (beds) < num and float (price)/ float (beds) > float (num)/ 2 : results.append(listing.text) #Add ID to the SET to make sure it doesn't get sent out twice seen.add(listing.get( "data-pid" )) return results

Cool stuff, here’s the rest of the code set up with an SMTP server to email the results directly to my email address. On the apartments module, I also tried to specify a low just in case people are putting up the 1 dollar bullshit that gets on Craigslist some of the time.

import requests from bs4 import BeautifulSoup from urlparse import urljoin import smtplib import csv def connect (email, password): server = smtplib.SMTP( 'smtp.gmail.com' , 587 ) server.ehlo() server.starttls() server.login(email, password) return server def email (sender, receivers, listings, server): for listing in listings: message = """From: From Person <from@fromdomain.com> To: To Person <to@todomain.com> Subject: NEW CRAIGSLIST POSTING """ + listing.text try : server.sendmail(sender, receivers, message) print "Successfully sent email" except : print "Error: unable to send email" def getSoup (url): response = requests.get(url) return BeautifulSoup(response.content) #Open a csv file containing a set of craigslist IDs previously sent out def openSet (path): reader = csv.reader( open (path, 'rb' )) return set ( list (reader)[ 0 ]) if __name__== '__main__' : URL = 'http://sfbay.craigslist.org/search/sfc/apa?hasPic=1&nh=25&nh=1&bedrooms=2' path = 'set1.csv' soup = getSoup(URL) seen = openSet(path) listings = apartments( 1800 , soup, seen) if len (listings) > 0 : server = connect( "YOUR_EMAIL" , "YOUR_PASSWORD" ) email( "YOUR_EMAIL" , [ 'jayfeng1@uw.edu' ], listings, server) cw = csv.writer( open ( "set1.csv" , 'wb' )) cw.writerow(listings)

Hopefully this will inspire some people started on the right track to customize some scrapers a little more. I love scraping, it’s absolutely the best and let’s people pursue their dreams and goals. Finding cheaper housing is good. But IFTTT has done a pretty good job on craigslist. If you look more into craigslist’s filters for housing, they can actually specify pictures or not, specific titles, and neighborhoods that you might want to live in. Once you have it drilled down, all you need is an email blast for each new listing. So I like where IFTTT is heading. But I also like where Kimono is going to because they can also get into this notification marketplace too with their customized APIs. If I had the money to invest I think I would.