Social cues are one way young children determine that a situation is pedagogical in nature—containing information to be learned and generalized. However, some social cues (e.g., contingent gaze and responsiveness) are missing from prerecorded video, a potential reason why toddlers’ language learning from video can be inefficient compared with their learning directly from a person. This study explored two methods for supporting children’s word learning from video by adding social-communicative cues. A sample of 88 30-month-olds began their participation with a video training phase. In one manipulation, an on-screen actress responded contingently to children through a live video feed (similar to Skype or FaceTime “video chat”) or appeared in a prerecorded demonstration. In the other manipulation, parents either modeled responsiveness to the actress’s on-screen bids for participation or sat out of their children’s view. Children then viewed a labeling demonstration on video, and their knowledge of the label was tested with three-dimensional objects. Results indicated that both on-screen contingency and parent modeling increased children’s engagement with the actress during training. However, only parent modeling increased children’s subsequent word learning, perhaps by revealing the symbolic (representational) intentions underlying this video. This study highlights the importance of adult co-viewing in helping toddlers to interpret communicative cues from video.