As Chicago Blackhawks break records of longest point streak in NHL this season (currently 21 games from the beginning) and my favourite team Pittsburgh Penguins are playing like a rollercoaster, I decided to take a look at my favourite data of them all – NHL stats.

The visualisation at its most current state can be found here. The most recent code can be found on BitBucket.

NHL.com’s standings page shows the latest streak but since I was interested in all streaks during the season, I did some digging and found this page containing all the games in chronological order with consistent and easy-to-parse table format.

First, to parse the data I used following code:

def readData(loadFromWeb=False): teams = defaultdict(list) if not loadFromWeb: with open('nhl.json') as jsonfile: teams = json.load(jsonfile) else: for i in range (1,PAGENUMBER+1): pageurl = "%s%s" % (baseurl, i) soup = bs(urllib.urlopen(pageurl)) all_tables = soup.findAll('table', { 'class' : 'data stats' })[0].find('tbody') trs = all_tables.findAll('tr') for tr in trs: team_success = {} tds = tr.findAll('td') home_team = tds[1].string away_team = tds[3].string home_goals = tds[2].string away_goals = tds[4].string home_win = int(home_goals) > int(away_goals) away_win = not home_win if home_win: teams[home_team].append('W') teams[away_team].append('L') else: teams[home_team].append('L') teams[away_team].append('W') print max([len(matches) for team, matches in teams.items()]) return teams

It provides possibility to either read data from JSON or from the actual page and it creates a dict with team names as keys and list of Ws (win) or Ls (loss). After that, the list is transformed to only acknowledge wins and losses that are in a streak:

def transform(teams): transformed = defaultdict(list) for team in teams.keys(): games = teams[team] for i in range(0, len(games)-1): if games[i] == games[i+1] or games[i] == games[i-1]: if games[i] == 'W': transformed[team].append('W') elif games[i] == 'L': transformed[team].append('L') else: transformed[team].append('') # Last game if games[-2] == games[-1]: if (games[-1] == 'W'): transformed[team].append('W') else: transformed[team].append('L') else: transformed[team].append('') return transformed

The data can then be written to either JSON or HTML. In the Git repo there are also files for HTML head and HTML tail which I combine with the script-written HTML to create the website. The visualisation can be found at my website.

Originally I was going to do the visualisation with D3.js or ggplot2 but after prototyping it with HTML/CSS, it looked quite good actually and I decided to leave it like that for now to keep a personal note that you can actually do quite something with just background-colored table cells.

The whole code can be found here