The Oxford Corpus of Old Japanese (abbreviated OCOJ) is a long-term research project which aims to develop a comprehensive annotated digital corpus of all extant texts in Japanese from the Old Japanese period. Old Japanese is the earliest attested stage of the Japanese language, largely the Japanese language of the Asuka and Nara periods of Japanese history (7th and 8th century AD). This is the formative literate period upon which the development of Japanese civilization is based, and these texts are of paramount importance for the study and understanding of the origins and development of civilization of Japan, including language, writing, literature, religion, history, and culture.

The OCOJ will contain

Texts: The corpus will contain all extant texts in Japanese from the Old Japanese period, presented in original script and phonemic transcription. See here for the texts and here for display conventions.



Annotation: A large amount of information about the texts will be encoded and made searchable. This will include linguistic information (orthographic, phonological, morphological, syntactic, semantic, and lexical information), as well as literary, biographical, historical, geographical and other information. The digital format makes it possible to add information of any kind continuously. See here for tagging conventions.

Translations: The texts will be supplied with translations into English. The translations will be linked to the texts.

Dictionary: A bilingual Old Japanese – English dictionary will be developed alongside and as an integrated part of the corpus. The dictionary part of the OCOJ will be linked to the texts, making cross-reference in both directions possible.

The OCOJ will provide a research and reference resource of value to specialist scholars and students of early Japan, but it will also provide wide and easy general access to a large body of important texts and materials for anyone interested in Japanese language, history and culture. The OCOJ will be published online and continuously updated, and it will be made easily accessible and searchable through a web-based interface. At the moment we post a simple digital text which gives the original script and a phonemic transcription of the texts.