This is the second of a four-part blog series detailing my experience applying/interviewing to be a Hockey Analyst with the Toronto Maple Leafs. You can find the first part here. In this part I go through the phone interview and my preparation for that. I hope this series sheds some light on the process of how interviewing for an analytics position with an NHL team works (or at least how one such process worked).

The Preparation

I had about 5 days between when I learned of the interview and when it took place. During this period I spent the majority of my free time preparing. The email describing the interview details specifically mentioned that it would be conducted by Darryl Metcalfe, the Director of Hockey Research and Development (R&D), and Cam Charron, an analyst in the same department. It also mentioned that it would cover “hockey, programming, and other general aspects of the position.” With this in mind I focused on three areas for my preparation:

Blogs/websites about hockey analytics

Rob Vollman’s “Stat Shot”

Articles from academic journals and conferences

I’d been aware of Rob Vollman’s magnum opus, Stat Shot, for a while. It seemed like a natural starting point so I borrowed a copy from the public library (I later bought a signed copy of the book — thanks Rob!) and powered through it. It served as a useful foundation for understanding how an existing analytics framework operated.

Of the two people interviewing me, Cam had a number of publicly available blog posts so I familiarized myself with his work. His posts generally focus on using statistics to contextualize then recent happenings in the NHL and are easily accessed with a simple Google search. I also trawled Hayden Speak’s prospect-stats.com, Dominic Galamini Jr’s HERO charts, and David Johnson’s now-defunct hockeyanalysis.com; to name a few.

Finally, I turned to peer-reviewed journal articles and conference papers. These tend to be extremely dense and difficult to digest so I generally had to read a paper two to three times before I was confident that I both understood the work and would retain that understanding.

I focused largely on papers published in MIT’s Sloan Sports Analytics Conference. From these Michael Schucker’s work on draft value is one that I still consistently refer back to. I also covered Timothy Chan’s cluster analysis of NHL player types and Ryan Stimson and Matt Cane’s use of the Passing Project data for analysis of defensive ability. In retrospect, work from hockey-specific conferences like VanHAC or OttHAC may have been more useful.

Peer-reviewed journal articles, sports analytics articles in particular, tend to be pay-walled. Luckily, as a recent graduate of the University of Toronto (U of T) I had institutional access to various journals. But even with that I was not able to get access to the Journal of Quantitative Analysis in Sport; which I felt would be an asset.

The Interview

The alternative way to make a conference call.

D-Day. After pacing anxiously around my room for 20 minutes, it was time. I dialed into the conference call where Darryl and Cam were waiting. The interview itself consisted of four parts: general questions regarding my background, questions involving analytics and hockey knowledge, technical questions, and then some general behavioral questions. I use headers below to denote each of these sections in case you’d like to skip to a particular one and I indicate questions as bullet points; in case you don’t care about my summarized answers which are the short paragraphs that follow each question.

General Background Questions

“What are you up to right now?”

At the time I was a Research Assistant at U of T. My role involved primarily putting the finishing touches on the projects I began during my MSc. I mentioned that I’d just finished submitting two papers for publication, one describing the original research I had conducted, and the other describing a program I had written to automate a labour-intensive data analysis routinely conducted by my lab (those papers have since been published and you can find them here and here respectively).

“Tell me about your MSc research. What did you discover regarding salinity tolerance in rice?”

In response to this I gave my standard elevator pitch about my research. This involved describing that developing countries tend to depend on rice for the majority of their calories, and these yields of rice are threatened as these countries are forced to use more marginalized farm land to increase their rice production. This marginalization tends to occur in the form of excess salt, which hurts crop yields by excessively accumulating in the plant (sola dosis facit venenum). My research at the time highlighted a “salt-elevator” in rice; by which salt, once it enters the leaves, is transported back out of the plant through the roots (this conclusion has since been amended — with the published work finding that larger leaf sizes seem to be one of the factors driving decreased salt toxicity).

“Tell us about about the problem that vaCATE [the program I had written to automate data analysis during my MSc] solved?”

The original data analysis for the technique involved (called Compartmental Analysis by Tracer Efflux or CATE) had been partially automated in the form of a Visual Basic macro embedded in an Excel spreadsheet. The major problems with this were that 1) the data analysis was partial with only the most relevant analysis conducted and 2) the macro had to be rerun for each replicate in the experiment; with experiments having routinely over 30 replicates.

To address these problems I ended up writing vaCATE, which imported the data for all the replicates at once and conducted a full analysis of each replicate — among other features. This solved the original problems and decreased the amount of time required for such analyses by over 90%.

“Tell me about UltimateStatisticsAnalysis. What insights did you gain?

This question caught me a bit off-guard as I hadn’t mentioned UltimateStatisticsAnalysis (USA) in my application, though I had referenced my GitHub page wherein USA was one of the available repositories. The hiring team had obviously done their homework!

I discussed USA and the fact it had been written to gain information about the pass-completion rates of the ultimate team that I was co-coaching/captaining. Through its use I was able to identify which players tended to turn the disc over. I then worked with these players by either practicing their catching (if they tended to drop the disc) or decision making (if they tended to try low-percentage throws). Interestingly, Darryl’s and Cam’s interest in USA actually stemmed from the fact they both play ultimate.

Hockey knowledge and analytics

“Hypothetical situation: your playoff opponent has the best penalty kill in the last ten years. Your coach comes to you and asks for a solution to this in the next two days. What do you do?”

I initially discussed referee bias and analyzed referee tendencies with the aim of incorporating them into game strategies: perhaps referees called penalties at different rates on different teams/players. However, I realized as I was talking that my answer was more appropriate for the opposite problem, wherein the opponent had the best power play in the league and my team was trying to stay out of the penalty box. I think Darryl mercifully recognized that I’d done this and further constrained the situation by saying that there would be no prior knowledge of who the referees will be.

From here I actually answered the question at hand. Different teams will have differing success rates against the opponent’s penalty kill, and I then could identify the teams that had the highest success rates against this penalty kill over the regular season. From there I’d examine the video recordings of these particular teams’ power plays to identify strategies which worked best against this hypothetically great penalty kill.

“Another hypothetical situation: say you are the only member of a Hockey Operations staff for an expansion team that has currently no contracts signed and is drafting 3rd overall. Nico Hischier goes first; Nolan Patrick goes second. What do you do with your third overall pick?”

My initial reaction was to leverage the fact that I’d have lots of room for contracts and to try trading down for more picks. In this situation I would be able to draft, sign, and develop a ton of players compared to an established NHL team; which usually run up against the 50 contract limit. I referenced Michael Schucker’s work on equivalent draft pick value (see Preparation section above) to ensure acquisition of a favorable return value.

“What if no one is willing to trade with you or offer favorable returns?”

I indicidated I felt that it would be hard to go wrong with a third overall pick. There tends to be a general consensus amongst the different draft ranking in regards to the top ten-or-so prospects. In a worst-case scenario where I have no staff to assist me and I’d have to make a decision, I could can use these rankings to help me make my selection.

“What if you had two days to prepare? How would you inform your decision?”

I structured my answer around what I would do the first day and then what I would logically follow that up with on the second day. I allocated the first day to using analytics to identify the top five-to-ten players I’m interested in, and then stated that I would devote the second day to watching video of these players before deciding. Analytics for amateur players tends to be fairly limited, so I’d likely be using points per game (taking into account prospect age) or some other statistic with similarly limited scope.

“Are you worried about discriminating against prospects in European professional leagues or defensemen using this approach?”

I acknowledged that this was definitely a risk, outlining strategies I could employ to try and avoid these areas of potential neglect while recognizing the limitations of such strategies.

For European leagues the additional barrier exists in having to translate web pages, so if I was extremely time-limited I would cut my losses and focus on North American leagues. If I did have enough time to look at European leagues I would do a comparison to past players in those leagues; limiting the search space to younger players (i.e., prospects).

For defensemen I conceded that Ilikely wouldn’t be able to draft a defensive defensemen as most statistics for amateur leagues are points or shots based. However, if I partition players by position I could then identify defensemen that were exceptional within their own cohort.

“Who are going to be the two best players in the NHL over the next two years?”

A super cool question. I immediately spoke about Connor McDavid given that he’d come into the league and almost immediately won a Hart Trophy. I also cited his statistics from the OHL, noting that he was such an outlier he’d “broken” a number of tools used to forecast prospect development. As an example of this I pointed to Hayden Speak’s Draft Expected Value). For my second pick I went with Erik Karlsson (R.I.P. Ottawa), as he had also had a statistically outlying season, most notably scoring an ungodly number of points as a defenseman while also somehow leading in blocked shots. I did voice some reservation with this pick as his recent season had almost been “too” good, and it appeared possible that he may have statistically peaked.

“What happens after an icing call in the NHL?”

For this I explained my knowledge of the situation from having watched it occur hundreds of times as any hockey fan has. Specifically, the face-off comes back down into the defensive zone of the team that iced it, with that team being unable to change (I did reference the shenanigans that the Leafs had tried against the Washington Capitals in their 2016–2017 playoff series). As well, this stoppage is the one exception to the media time-out. Normally commercial breaks occur at the first stoppage after every five-minutes of game-time (this has since changed). Icings are the exception to this.

“What happens if there is another icing immediately after?”

This question took me a bit aback. I was fairly certain nothing changed from my answer above, though the question was phrased as if something might. I stuck to my guns.

Technical Questions

“What was the biggest challenge you faced making your CHL scraper”

I had mentioned my CHL scraper in the cover letter I submitted for the job application. The CHL scraper was the 4th or 5th project of this nature that I had built so I actually hadn’t had a lot of trouble. I pivoted to instead talking about troubles I had had with my initial NHL scrapers. In my cover letter I alluded to problems that had arisen with my initial NHL scraper and why that had necessitated a rewrite. My answer involved going into more detail regarding what the NHL had specifically done to break my initial scraper and what I had done to adapt my follow-up scraper to address these challenges.

“What code editor do you use?”

Sublime Text was the editor I used when I started programming. I had subsequently begun using PyCharm as it was a requirement for the computer science classes I was taking at U of T. I stuck with the latter because I learned some of its more advanced features over the course of my studies. I followed this up by emphasizing that I tried not to tie myself to any particular technology; if something better came along I’d very happily switch to it.

“What is the difference between a list and a tuple in Python?”

The obvious thing that came to mind was mutability. Darryl followed up by asking if there were any other differences. I improvised an answer involving memory allocation, wherein I hypothesized that the block of memory assigned to a tuple could be exactly the amount required (because the contents of the tuple can’t change after they have been assigned) versus the block of memory assigned to a list has to actually be larger than the content of the list because the contents of a list can be dynamically increased after its creation.

“Given two lists of five-to-six jersey numbers, how can you determine if a number occurs in both lists?”

A pretty simple algorithm question. Given the small sizes of the lists I responded that I could just nest two loops without worrying about algorithm run time. In the outer loop I would iterate through the numbers in the first list, and the inner loop would iterate through the numbers in the second list. As I iterated I’d compare the numbers in each list, adding the number to a tracking list if I got a match.

“What if the two lists were of hundreds or thousands of jersey numbers?”

Now algorithm run-time begins to matter more. I clarified with them that the lists were unordered. An efficient ordering algorithm could order the lists in O(n log n). After I did that, a pairwise comparison between the lists could be done in O(n) time.

“Tabs or spaces?”

Ah, the quintessential computer science debate. I had actually just watched the episode of Silicon Valley parodying this (above). They’re largely the same, but I use tabs. I mention that there’s a slight memory advantage to that (something I actually learned from Silicon Valley). Darryl followed this up by noting that either was fine as long as you were consistent.

“How do you go about testing a program or a tool?”

I explained that I tried to make my code as modular as possible, so that it would be possible to create unit tests for each “module” of code. These tests should include general cases, edge cases, and test expected behavior given an invalid input. For my hockey scraper I had grabbed data from both the play-by-play pages and game summary pages. These could be used to cross-validate the output of the respective scrappers (i.e., the number of shots added up in a game’s play-by-play report should equal the number of shots registered in a game’s summary report). In vaCATE I had nose and nose-parameterized to run suites of the same test with different inputs.

General Questions

“Why do you think you would be a good fit for this position?”

I had been performing the job requirements listed by the application for fun and felt this made me a great fit. The only thing that would change if I got the job is that I would getting paid and could now devote more of my time to programming with hockey analytics in mind. I followed this up by admitting that I was a huge Leafs fan (it hadn’t come up yet); adding that I had used ExtraSkater.com before it went off-line, and that I had closely followed Darryl’s hiring by the Leafs. Having the opportunity to provide information to decision-makers on the Leafs would be something that transcended just being a job for me and would be something that I would take extremely seriously.

“It is said that the three great virtues of any developer are laziness, hubris, and impatience. Which do you identify with?”

After some deliberation; I landed on lazy. I wrote vaCATE because doing a six-to-seven hour data analysis after an 18-hour experiment was slowly eroding my soul. I just wanted to finish the experiment, press a button, and have the analysis pop out. To be fair, I don’t think that’s “lazy” so much as “not wanting to do an onerous amount of work.”

“Say hypothetically you get the job; what would you want to accomplish in the first month?”

I surmised that the Leafs had their own internal analytics tool. So my goal would be to get to know it inside and out. Furthermore, I’d always wanted to delve into further understanding the salary-cap rules, the waiver process, and the collective bargaining agreement. Once this topics were within the scope of my job description I could actually examine them in greater detail.

To illustrate how cost-effective an analyst in this kind of role could be, especially given the millions of dollars teams often spend on buying-out players or other mistakes I discussed the Ryan O’Reilly debacle. Briefly, the Calgary Flames tendered an offer-sheet for Ryan O’Reilly, who was then a free agent for the Avalanche. However, he would have had to pass through waivers to play for the Flames, which wasn’t going to happen. What team is going to pass up a top-six, cost-controlled center as a waiver pick-up?

Retrospective

To finish the interview Darryl asked if I had any questions for himself orCam. I got to gain some insight as to how Darryl learned the skills to make ExtraSkater.com (largely self-taught through internet resources) and what the difference was between solo-authoring blog posts and being an analyst on a larger team (people hold you accountable for you work), and — just like that — the interview was over.

I hung up the phone on a bit of a high. A large part of me was just happy the interview was over and I could move past the stress and anxiety of having to prepare and deal with the interview process. I had no expectations (good or bad) as to whether or not I would be moving to the next stage of the screening process. As far as I was concerned, I was playing with house money.

I felt that Darryl’s pattern of asking a question, getting an answer, and then constraining the question was an extremely effective interview style. It both forced me to adapt to new scenarios and allowed me to move past things I was struggling with, all the while letting Darryl gain insight into my thought process and how I problem solve. I actually had a lot of fun during the interview, especially with the hockey-related questions. Yes — fun — in a job interview. I mean; I got to talk about hockey with the Director of Hockey R&D for the Toronto Maple Leafs for over an hour. What more could I ask for?

Well, that’s what I thought until I received the following email:

Which was followed up by these details:

To be continued…