







Cornell Movie--Dialogs Corpus











Distributed together with: Chameleons in Imagined Conversations.











ZIP File











Related corpus: Cornell Movie-Quotes Corpus











DESCRIPTION:



This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts:



- 220,579 conversational exchanges between 10,292 pairs of movie characters

- involves 9,035 characters from 617 movies

- in total 304,713 utterances

- movie metadata included:

- genres

- release year

- IMDB rating

- number of IMDB votes

- IMDB rating

- character metadata included:

- gender (for 3,774 characters)

- position on movie credits (3,321 characters)

- see README.txt (included) for details



















BibTeX ENTRY:



@InProceedings{Danescu-Niculescu-Mizil+Lee:11a,

author={Cristian Danescu-Niculescu-Mizil and Lillian Lee},

title={Chameleons in imagined conversations:

A new approach to understanding coordination of linguistic style in dialogs.},

booktitle={Proceedings of the

Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011},

year={2011}

}























This material is based upon work supported in part by the National Science Foundation under grant IIS-0910664.



Any opinions, findings, and conclusions or recommendations expressed above are those of the author(s) and do



not necessarily reflect the views of the National Science Foundation.