WELCOME TO the

Script Encoding Initiative

The Script Encoding Initiative (SEI), established in the UC Berkeley Department of Linguistics in April 2002, is a project devoted to the preparation of formal proposals for the encoding of scripts and script elements not yet currently supported in Unicode (ISO/IEC 10646).

Unicode is the universal computing standard specifying the representation of text in all modern software. To date, Unicode has largely focused on the major modern scripts, particularly those scripts most widely used in business. Some minority and historic scripts have already been encoded, as well as historic characters of the major modern scripts.

​

Over 100 scripts remain to be encoded. Minority scripts are still used in parts of South and Southeast Asia, Africa, and the Middle East. Unencoded scripts include Kpelle and Loma. Scripts of historical significance include Book Pahlavi, Large Khitan, and Jurchen. Even for major modern scripts there are many difficult historical issues remaining to be addressed: for example, the encoding model for Chinese (written continuously for nearly

3,000 years) is still being refined.

Because proposals for the encoding of minority and historical scripts often entail significant research, and their user communities have little economic or political voice, such script proposals have not been submitted to the Unicode Technical Committee (UTC) in any regular manner. It has been estimated

that at the current slow pace of encoding, many scripts will still

be unencoded in ten years. This means that effectively, many linguistic minorities and scholarly communities could be permanently left behind in the information age. For scholars who manage to work with obsolete computing technologies, their valuable data is destined for the electronic dust-bin, unless they move resolutely in the direction of modern computing standards.