WIT 3

W eb I nventory of T ranscribed and T ranslated T alks

WIT- acronym for Web Inventory of Transcribed and Translated Talks - is a ready-to-use version for research purposes of the multilingual transcriptions of TED talks.Since 2007, the TED Conference has been posting on its website all video recordings of its talks, English subtitles and their translations in more than one hundred languages. In order to make this collection of talks more effectively usable by the research community, the original textual contents are redistributed here, together with MT benchmarks and processing tools.

For a detailed description of this corpus, read:



M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks.

In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib.

Please, cite the paper if you use this corpus in your work.





▸ Latest version of XML files of the TED Talks (April 2016):

▸ Releases

▸ Note on Transcripts/Translations

TED transcripts and translations were generated following these guidelines: How to Tackle a Transcript and How to Tackle a Translation WITredistributes original TED texts in their original format, therefore infos included in TED guidelines are mostly valid also for WITtexts. An important difference regards texts of development and evaluation sets where metadata, for example those regarding sound information, are removed.

▸ Terms of Use

TED makes its collection of video recordings and transcripts of talks available under the Creative Commons BY-NC-ND license (look here ). WITacknowledges the authorship of TED talks (BY condition) and does not redistribute transcripts for commercial purposes (NC). As regards the integrity of the work (ND), WITonly changes the format of the container, while preserving the original contents. WITaims to support research on human language processing as well as the diffusion of TED Talks!

▸ Acknowledgments

The work was partially supported by the EU-BRIDGE and CRACKER projects, funded by the European Commission.

▸ Related resources

▸ Contact person

Mauro Cettolo (cettolofbk.eu)