May 2, 2014; Portland, OR, USA; Portland Trail Blazers guard Damian Lillard (0) makes a three pointer at the buzzer over Houston Rockets forward Chandler Parsons (25) to win the game during the fourth quarter in game six of the first round of the 2014 NBA Playoffs at the Moda Center. Mandatory Credit: Craig Mitchelldyer-USA TODAY Sports

I was watching an NBA game when I heard Jeff Van Gundy comment on the 2 for 1 strategy in basketball. He said all the analytics guys push to take the 2 for 1 shot in every situation, but asked if it’s a tied fourth quarter Game 7 of the NBA finals, would it be smart to go for the 2 for 1 situation.

For those that don’t watch basketball, the two for one shot involves a strategic play at the end of the quarter. Each possession in basketball is at max 24 seconds long to prevent the game from stalling (in college it’s 35 seconds). The two for one strategy then details for a team in possession of the basketball to shoot the ball between 26 – 39 seconds on the game clock in order to get the ball back again for another shot at the end of the quarter, therefore allowing themselves two possessions to the other team’s one.

I thought Jeff Van Gundy raised an interesting question and I was also a self-proclaimed analytics guy. It’s a hard data science questions as there aren’t that many past Game 7’s in the NBA even, much less ones that get decided by fewer than one or two possessions. But still a 2 for 1 situation just makes intuitive sense at the very least. Two chances to score are always better than the one. But how much better is it?

The only way I can see how to predict initial preliminary analysis would be identifying all of the 2 for 1 data that I can find in the play by play logs of the NBA. So I went ahead and strategically scraped through the play by play data in Basketball Reference to logically retrieve what I would constitute as part of a two for one.

Note I do have the code on Github, but the messiness is unreal. All I wanted to do was create a scraper that would identify a 2 for 1 opportunity in every game from 2001 to 2015. It did the job, and now I feel like I should destroy it to rid the world of its misery. But if anyone wants to collaborate together to make an unofficial API for basketball reference or any other play-by-play log, I would welcome it. Email or Twitter.

Anyway, combing through each play-by-play game, I selected individual shots that occurred after 39 seconds on the game clock and before 25 seconds. Even though it’s not entirely plausible to get a 2 for 1 with only a couple seconds on the game clock, there are time-outs and running miracle shots. By the end of the scraping (which took a surprisingly long time to run), I got more than 70,000 rows of data for individual two for one plays since 2001(There are none play by play logs before that time).

The main metric to look out for is the differential gain from the score right before the two for one shot, to the score at the end of the quarter. For example, Blazers vs Rockets, 2nd quarter the clock displays 35 seconds and the score is POR 50- HOU 50 right as Damian Lillard launches a 3 pointer. Let’s say he makes it, Houston scores at the other end, and then Portland scores again at the end of the quarter. At the end of the 2nd quarter, the score is POR 55 – HOU – 52 and the differential gain is 3 points. Another way to call it is the expected value from taking the 2 for 1.

But obviously that’s not going to happen for each occasion. Damian Lillard makes less than 40 percent of the threes that he takes. There are any number of situation that can happen with 35 seconds left on the clock, but we can also generalize and look at most of them to average out a gain. ALSO NOTE: All of the shots and statistics come from the initial shot being made. Ultimately a two for one is dependent on all factors of the initial and last shot as well as defense by a team, but I am analyzing all of the things that happen when a player first even attempts a two for one.

From the histogram above, it looks like 0 point difference is the most common outcome of the two for one situation. Then the next most popular outcome is getting a two point difference gain at the end of the quarter. The overall average is around +0.74 points for each two for one shot which equates to almost 3 points over four quarters. The standard deviation is 2.05 so the value gained by the 2 for 1 is skewed to the right yet still very much close to 0. Therefore taking the mean value for the difference gain should be evaluated with some discretion. We can also subset the data to look at advantages or disadvantages to getting a higher average gain. Let’s look at fouls to see if getting fouled while attempting a shot averages a better gain.

sample = no_foul[np.random.rand( len (no_foul)) < . 3 ] pyplot.hist( list (sample.diff_gain), diff_bins, label= 'Shot' ) pyplot.hist( list (foul.diff_gain), diff_bins, label= 'Foul' ) pyplot.legend(loc= 'upper right' )

In the case of a foul, the highest frequency expected value gain for the two for one increases to 2 points. The mean average of the gain goes up higher as well to +1.14 points per two for one attempt and running a t-test for both distributions, there’s a significant difference. This usually makes sense because foul shots are almost guaranteed baskets to good free throw shooters. If we try plotting a histogram of the most frequent distances players shoot from, we get unsurprising results of shots close to the rim and three pointers. The plot next to it features initial shot distance vs average expected value from two from one opportunities. As you can see the curves are close and the players are optimizing for best value shots.

The Case For An Unconventional Early Foul

Lebron James is super dominant. Every time I watch him go one-on-one against teams at the end of the quarter I feel like he either scores or gets fouled as he drives to the rim. So why should he be allowed to burn off precious time, when a team could potentially foul him early and receive the ball with a chance for a 2 for 1 or more time to shoot.

Looking at Lebron’s free-throw percentage, Lebron averaged 71% in 2014-2015. That gives him an expected value of 1.42 points every time he steps up to the stripe to shoot two. But what is his average differential gain for when he shoots for the 2-for-1?

In [54]: mean(lebron.diff_gain)

Out[54]: 0.78531073446327682

In [55]: mean(lebron[lebron.quarter!=’End of 4th quarter’][‘diff_gain’])

Out[55]: 0.83855421686746989

So Lebron has a slightly better two for one expected value when it’s not the fourth quarter (heh heh). Lebron’s team gains an average of .79 points per two-for-one opportunity for his entire career which consists of 525 shots. Nothing too spectacularly or out of the ordinary. But why not foul him before he gets a chance to shoot for the two-for-one? If we look at the average amount of points gained for the 2014-2015 season, there are a number of teams that average higher 2-for-1 point gains.

Philadelphia leads the effectiveness in average gain in points for 2 for 1 situation though Portland and Dallas lead in the amount of attempts (and have exact equal average gains?!?!). But say it’s Portland playing against Cleveland and Portland fouls him when the clock is at 38 seconds and before he can take a shot.

If Lebron averages 1.42 points per free-throw and Portland gets the ball back for the 2 for 1 opportunity, Cleveland will average a gain of:

1.42 (Lebron’s free throw expected value)

0.799 (Portland’s average 2 for 1 expected value gain)

1.42 – 0.799 = +0.62 points from the foul to the end of the quarter.

Alternatively, if Lebron is allowed to go for the 2 for 1 on his own and Portland doesn’t foul, Cleveland will average a point gain of +0.79 by the end of the quarter. Therefore, Portland could minimize it’s differential loss by preventing more points scored by Lebron.

While 0.17 of a point does seem rather insignificant, it goes to show how optimizing for the two for one can increase an advantage. Take the Clipper vs Rockets playoff game. Dwight Howard has an even worse free throw percentage at 52 percent for this last season. Teams originally do the Hack-A-Howard when they believe that each possession they have will be higher than the expected value of 1.04 points for two free throws. In the case of the two-for-one, the Clippers facing Dwight Howard in the playoffs could minimize their loss and come ahead with a foul into a two for one opportunity.

0.835 (Expected Two For One Gain for LA Clippers)

1.04 (Expected Value Gain for Dwight Howard’s free throws)

0.835 – 1.04 = –0.21 points loss for the Clippers vs allowing the Rockets to run a two for one and an average deficit of –0.88 points. That’s a differential of over over half a point, and potentially almost 2 points if done for the entire game. Of course this only works if the Clippers bench Deandre Jordan because Houston could then go back and foul right back to try to swing the odds in their favor. That’s what happens when strategy is universal.

BUT WAIT. I know what you’re thinking. You saw the histogram before of players getting fouled and averaging a +1.14 difference gain. So what’s that all about? In which case I would argue that the other team probably did not go for the two for one after the foul shots. These fouls are mostly shooting fouls that occurred below 39 seconds in the game clock as well as could be a completed shot plus a foul. Of which the other team would not feel the need to run to get a bad two for one shot. YETTTT…

Is There a Bad Two For One Shot?

The theory is that a rushed or bad two for one attempt is worse than one good shot attempt. And that’s probably valid, except for that the data might disprove it. We can look at a couple of examples of heat-maps to gain some initial insight into the matter. I used Seaborn to create a heat-map matrix of the most frequent shot attempts based on the shot clock on the horizontal axis and the game clock on the vertical axis. Imagine it as a two way histogram if that makes any sense.

We can see a cluster of the most frequent 2 for 1 shot attempts happening at around 29 to 34 seconds on the game clock and 14 to 20 seconds left on the shot clock. But let’s see if the cluster necessarily gives the most efficient two for one gain.

sns.set_context( "poster" ) plt.figure(figsize=( 8 , 6 )) sns.heatmap(closeMatrix, annot= True , fmt= "d" ); #Filter for quick 2 for 1 shots urgent = data[(data.clock_start < 36 ) & (data[ 'shot_clock' ] < 19 ) & (data.quarter != 'End of 4th quarter' )]

I cut off the first four seconds on the heat map matrix because of the lack of data and for filtering out catch and shoots off of timeouts and such. The average gain is multiplied by 100 in the heat map as well for visualization purposes. Ultimately there isn’t a real reliable pattern across the board except for a possible higher gain earlier in the game clock and shot clock as well.

But taking the mean of the urgent difference gain, I got a value of +0.6 points per two for one opportunity. I defined urgent as the game clock for the possession starting before 36 seconds, the ball being shot after at least six seconds has passed in the shot clock, and all quarters besides the fourth quarter. Therefore a player has enough time to run the ball up and shoot up a rushed two for one, and yet there still is an advantage to it.

Factors For Success

Then I got interested on what happens when filtering for close games in the fourth quarter.

The left map shows a higher gain clustered when shots are taken earlier on in the shot clock and the right map shows a higher cluster of shots being taken when the shot clock is dwindling down. So are the ideals fundamentally flawed?

Ultimately there may be a case of NBA teams always looking for a better shot until the shot clock goes down too much and the opposing defense gets set. But players also cannot control the exact timing of when they should get the ball dependent on the game clock.

The line graph below also displays a downwards trend toward the effectiveness of the two for one the more ahead in score a team gets.

Conclusion

I usually try to integrate some sort of prediction into my analysis but will skip over that here. I ran a logistic regression in R to try to predict a positive or negative gain until I realized that it’s pretty obvious why it would never work. Basketball in itself is always and shall be unpredictable. If it could be modeled with a statistical software, then no one would be watching the playoffs right now. No matter how many factors I put into the regression/machine learning model, Damian Lillard will always be streaky, Lebron James will miss a wide open dunk, and Dwight Howard could potentially hit all 20 of his free throws. But that’s what makes the game interesting! But ultimately there will be room for prediction in some capability, however small that factor could be.

There are however a few factors that can be summarized over this analysis and that is the potential benefit of taking quicker shots in the possession. While there can be some statistical variability dependent on points on turnovers and potential outlet dunks, there is a slight skew towards a better chance of potentially finding the defense off-guard and not totally set when shooting a quicker shot. This also allows a team more time after the opposing team scores for a good final shot.

Overall, basketball analysis can take forever. If I am here creating more than 10 graphs on a single strategic play that may benefit a couple of points here and there, it makes the future of analytics look bright. There are a ton more things I want to show that I don’t have the time so expect a post on end of the game analysis sometime in the future. Also I am thinking about creating an API for querying basketball play-by-play data, maybe not from Basketball Reference as I stated before.

I did some analysis in trying to find out if making their shots in the two for one increased their overall level of play in the next quarter but got statistically insignificant results. Ultimately it may be the smallest increment that just boosts a teams points by a little bit averaged over many games. Yet many games also get decided by one point so any kind of strategy also matters.

Email me at jayfeng1@uw.edu for any comments or thoughts. Also any personal insights or corrections.

LinkedIn, Twitter