Python and soccer… who knew ?

In this post we will make a video summary of this soccer game, using the fact that supporters (and commentators) tend to be louder when something interesting happens.

The next lines open the video file with Python and compute the audio volume of each second of the match:

1 2 3 4 5 6 7 import numpy as np # for numerical operations from moviepy.editor import VideoFileClip , concatenate clip = VideoFileClip ( "soccer_game.mp4" ) cut = lambda i : clip . audio . subclip ( i , i + 1 ) . to_soundarray ( fps = 22000 ) volume = lambda array : np . sqrt ((( 1.0 * array ) ** 2 ) . mean ()) volumes = [ volume ( cut ( i )) for i in range ( 0 , int ( clip . duration - 1 ))]

If we plot the obtained volumes we see that each goal is followed by a few seconds of loudness:

It is much clearer if we compute the average volumes over periods of 10 seconds:

1 2 averaged_volumes = np . array ([ sum ( volumes [ i : i + 10 ]) / 10 for i in range ( len ( volumes ) - 10 )])

The five higher peaks in the above graph give us the times of the five goals of the game, but other peaks may also indicate interesting events. In the next lines, we select the times of the 10% highest peaks:

1 2 3 4 5 increases = np . diff ( averaged_volumes )[: - 1 ] >= 0 decreases = np . diff ( averaged_volumes )[ 1 :] <= 0 peaks_times = ( increases * decreases ) . nonzero ()[ 0 ] peaks_vols = averaged_volumes [ peaks_times ] peaks_times = peaks_times [ peaks_vols > np . percentile ( peaks_vols , 90 )]

As a refinement, we regroup the times that are less than one minute apart, as they certainly correspond to the same event:

1 2 3 4 5 6 7 final_times = [ peaks_times [ 0 ]] for t in peaks_times : if ( t - final_times [ - 1 ]) < 60 : if averaged_volumes [ t ] > averaged_volumes [ final_times [ - 1 ]]: final_times [ - 1 ] = t else : final_times . append ( t )

Now final_times contains the times (in seconds) of 21 events, from which we can cut our video. For each event we will start five seconds before its time and stop five seconds after :

1 2 3 final = concatenate ([ clip . subclip ( max ( t - 5 , 0 ), min ( t + 5 , clip . duration )) for t in final_times ]) final . to_videofile ( 'soccer_cuts.mp4' ) # low quality is the default

Results

We obtain the following 3:30 video summary (sorry for the external links, these videos can’t be embedded).

Nicely enough, the same 25 lines of code can be used to cut this other summary of this other match. The limitations of the method appear in yet another summary which only captured 8 out of the 9 goals of the match, one or two being badly cut. The algorithm can be confused by broadcasters which make lots of replays or lower the sound of the crowd after goals, and it may miscut some goals on penalties, because the crowd starts whistling long before the shoot. So large-scale applications would require a less naive model.

If you want to try it at home, here is the whole script. It would be interesting to see how the method works on other sports, or how it could be generalized to other uses, like spotting action scenes in movies.