USCII ("you-ski") stands for Universal Semiotic Coding for Information Interchange. It is a system for embedding pictures inside the numbers agreed upon to represent symbols and control codes. I was inspired to create it by the famous Arecibo Message , which attempted to convey humanity's physics knowledge without assuming a cultural context other than math.

For instance, instead of ASCII's encoding of 65 for "A" and 66 for "B"...we might consider using the number 15621226033 for "A" and 16400753439 for "B". To see the bitmaps, you must first convert these values into binary:

15621226033 (base 10) = 01110100011000110001111111000110001 (base 2)

16400753439 (base 10) = 11110100011000111110100011000111110 (base 2)

When transmitted in a medium which hints at the significance of a 35-bit pattern, the semiprime nature of 35 suggests decomposing it into the factors 7 and 5. That produces (small) images of an A and a B. Larger prime factor choices could be used to get more coverage in Unicode--such as a 23x23 font for Chinese.

Overview Video

Live Demo

There is an online encoder, as well as a decoder... (currently only for the standard "USCII-5x7-ENGLISH-C0"). The encoder walks you through how the standard works. So just try typing something in the input box and read the explanation there:

The decoder is descriptive as well, and explains the steps. But if you send a message to a friend who hasn't heard of USCII, I'd be interested to know how many quickly figure it out without using the decoder. Any stories about that are welcome.

Arecibo ASCII

I've developed a draft specification of USCII variation "5x7-ENGLISH-C0". This uses 35 bits per character, and includes printable characters as well as the "C0 control codes" . You can read the script that generates it, which contains comments on why I picked the bit patterns:

I've informally named this variant "Arecibo Ascii". That's because it is possible to losslessly convert a stream of conventional ASCII characters into USCII-5x7-ENGLISH-C0 (and back again). It's still a work in progress, but here's the table as it currently stands:

ASCII Character Arecibo ASCII (35-bit binary) 0 Null character 10101010101010101010101010101010101 1 Start of Header 10101101111010110111101011011110101 2 Start of Text 11011111111101111111110111111111011 3 End of Text 11011110111101111011110111111111011 4 End of Transmission 11111111111111111111111111001110011 5 Enquiry 11111111010000011101101110000010111 6 Acknowledgment 11111101011111111111011101000111111 7 Bell 11011100011000110001000001111111011 8 Backspace 11111110111011100000101111101111111 9 Horizontal Tab 00000000000000111101000010000000000 10 Line Feed 11100001000010000100111110111000100 11 Vertical Tab 00100001000010000100001000000001110 12 Form feed 11111011100010000000111110111000100 13 Carriage return 00001000010010101101111110110000100 14 Shift Out 00100101111101111011110111110111100 15 Shift In 11100111011101111011110111011100100 16 Data Link Escape 11111111110010001110001001111111111 17 Device Control 1 11111101111001110001100111011111111 18 Device Control 2 11011110110101001010011100111010001 19 Device Control 3 11111101011010110101101011010111111 20 Device Control 4 11111100011000110001100011000111111 21 Negative Acknowledgement 11111101011111111111100010111011111 22 Synchronous Idle 11111111111111111111111110101011111 23 End of Trans. Block 11111000000111001010011100000011111 24 Cancel 10001000000101000100010100000010001 25 End of Medium 11111100010110001010001101000111111 26 Substitute 10001011101111011101110111111111011 27 Escape 00011001100101011110011100111010001 28 File Separator 10101101011010110101101011010110101 29 Group Separator 11011110111101111011110111101111011 30 Record Separator 11110111101101010010000001001111011 31 Unit Separator 11111111111111111111100111101110111 32 Space 00000000000000000000000000000000000 33 ! 00100001000010000100000000000000100 34 " 01010010100101000000000000000000000 35 # 01010010101111101010111110101001010 36 $ 00100011111010001110001011111000100 37 % 11000110010001000100010001001100011 38 & 01100100101010001000101011001001101 39 ' 01100001000100000000000000000000000 40 ( 00010001000100001000010000010000010 41 ) 01000001000001000010000100010001000 42 * 00000001001010101110101010010000000 43 + 00000001000010011111001000010000000 44 , 00000000000000000000011000010001000 45 - 00000000000000011111000000000000000 46 . 00000000000000000000000000110001100 47 / 00000000010001000100010001000000000 48 0 01110100011001110101110011000101110 49 1 00100011000010000100001000010001110 50 2 01110100010000100010001000100011111 51 3 11111000100010000010000011000101110 52 4 00010001100101010010111110001000010 53 5 11111100001111000001000011000101110 54 6 00110010001000011110100011000101110 55 7 11111000010001000100010000100001000 56 8 01110100011000101110100011000101110 57 9 01110100011000101111000010001001100 58 : 00000011000110000000011000110000000 59 ; 00000011000110000000011000010001000 60 < 00010001000100010000010000010000010 61 = 00000000001111100000111110000000000 62 > 01000001000001000001000100010001000 63 ? 01110100010000100010001000000000100 64 @ 01110100010000101101101011010101110 65 A 01110100011000110001111111000110001 66 B 11110100011000111110100011000111110 67 C 01110100011000010000100001000101110 68 D 11100100101000110001100011001011100 69 E 11111100001000011110100001000011111 70 F 11111100001000011110100001000010000 71 G 01110100011000010111100011000101111 72 H 10001100011000111111100011000110001 73 I 01110001000010000100001000010001110 74 J 00111000100001000010000101001001100 75 K 10001100101010011000101001001010001 76 L 10000100001000010000100001000011111 77 M 10001110111010110101100011000110001 78 N 10001100011100110101100111000110001 79 O 01110100011000110001100011000101110 80 P 11110100011000111110100001000010000 81 Q 01110100011000110001101011001001101 82 R 11110100011000111110101001001010001 83 S 01111100001000001110000010000111110 84 T 11111001000010000100001000010000100 85 U 10001100011000110001100011000101110 86 V 10001100011000110001100010101000100 87 W 10001100011000110001101011010101010 88 X 10001100010101000100010101000110001 89 Y 10001100011000101010001000010000100 90 Z 11111000010001000100010001000011111 91 [ 01110010000100001000010000100001110 92 \ 00000100000100000100000100000100000 93 ] 01110000100001000010000100001001110 94 ^ 00100010101000100000000000000000000 95 _ 00000000000000000000000000000011111 96 ` 01000001000001000000000000000000000 97 a 00000000000111000001011111000101111 98 b 10000100001000011110100011000111110 99 c 00000000000111110000100001000001111 100 d 00001000010000101111100011000101111 101 e 00000000000111010001111111000001111 102 f 00010001010010001110001000010000100 103 g 00000000000111110001011110000111110 104 h 10000100001000011110100011000110001 105 i 00000001000000000100001000010000100 106 j 00010000000001000010000101001001100 107 k 01000010000100101010011000101001001 108 l 01100001000010000100001000010001110 109 m 00000000001101110101101011010110001 110 n 00000000001011011001100011000110001 111 o 00000000000111010001100011000101110 112 p 00000000001111010001111101000010000 113 q 00000000000111110001011110000100001 114 r 00000000001011011001100001000010000 115 s 00000000000111110000011100000111110 116 t 00100001001111100100001000010100010 117 u 00000000001000110001100011000101110 118 v 00000000001000110001100010101000100 119 w 00000000001000110001101011010101010 120 x 00000000001000101010001000101010001 121 y 00000000001000101010001000010001000 122 z 00000000001111100010001000100011111 123 { 00011001000010001000001000010000011 124 | 00100001000010000000001000010000100 125 } 11000001000010000010001000010011000 126 ~ 00001011101000000000000000000000000 127 Delete 11111110001010100010101011100011111

It isn't enough just to pick the character values, however. The Arecibo message was only sending one big bitmap; and a USCII string will consist of many characters. Two 35-bit characters in sequence suddenly have a length of 70 bits, and the semiprime hint is gone.

There are arguably a lot of ways to build a container format for USCII codes, but I picked a fairly general one, establishing what I call "meter" and "silence" enclosing the characters. For a WxH bitmap size choice, the layout is this:

leading silence : H repetitions of a sequence of WxH + W zero bits

leading meter : W repetitions of a sequence of WxH one bits, followed by W zero bits

message : each WxH character, individually followed by W zero bits

trailing meter : H repetitions of a sequence of WxH one bits, followed by W zero bits

trailing silence: W repetitions of a sequence of WxH + W one bits

It's something easier to see in the online encoder and decoder than by trying to grasp it abstractly. But that's the formula. It has the added bonus of producing multiples of 8 bits for all string lengths with W=5 and H=7, or W=23 and H=23.

The C0 codes are admittedly rather tricky. Especially to depict things like "Data Link Escape" or "Device Control 1"! It would be possible to use a larger bit size and get clearer images. But I'd like to see how far the 35 bit standard can go in cueing people who aren't familiar with ASCII into what the bitmaps signify...

Another thing I've increasingly been considering is that any meaningful and expressive picture will probably find its way into a symbolic font. This means that if Arecibo ASCII is to extend into "USCII Unicode", any pictures chosen for control codes are probably on the table as literal pictures. I'm not sure how to handle that problem.

Again, ideas are welcome!