Digital television pictures are sampled in three dimensions: two spatial dimensions, and time. This paper investigates the effect of treating video capture as a traditional temporal sampling problem, such that the frame rate is double the highest frequency in the video, or, conversely, the video signal is temporally band-limited to below half the frame rate. A significant contribution of this work is to find the fastest motion that is of interest, from which a maximum temporal frequency and hence a minimum frame rate can be calculated. To find the fastest motion of interest, a model of the human spatio-temporal contrast sensitivity function is used. For each spatial frequency, the velocity at which humans are able to resolve moving detail as well as the detail on a static object of the highest possible spatial frequency in a particular spatial format is found. This subjective matching procedure can be interpreted as finding the minimum frame rate that does justice to a specified spatial resolution, assuming that classical sampling theory is adhered to. A model of human eye tracking is then included, to take account of the effect that humans are able to resolve detail on a moving object more easily when our eyes are following the object than when our eyes are static. Incorporating the eye tracking model results in minimum frame rate requirements that are many times higher than those in use today. However, this does not take account of all the effects of eye movements: following a moving object can also reduce the visibility of aliasing artefacts, and hence the paper concludes with a discussion of why a degree of aliasing can be permitted, and hence traditional sampling theory does not necessarily need to be applied in television sampling.