January 20, 2015 — Sean P

A random language generator is a tricky thing to do because of all the sheer amount of irregularity a language can have. It's not a matter of just memorizing forms and tables, but the many, many exceptions to those forms and tables that language will have. And, not all languages operate similarly. It's easy to assume that all languages act like English must, like, for example, word order (subject comes first, then verb, then object), but some languages it doesn't really matter what goes where. Or transitivity. John breaks the glass. The glass breaks. In a nominative-accusative language, if we have a nominative marker of "na" and accusative marker of "ka", both of those sentences would be "John-na breaks the glass-ka." and "The glass-na breaks." But not all languages work like this. In an ergative language, it would probably look more like "John-na breaks the glass-ka." and "The glass-ka breaks." If making a random language generator, how do you account for this, and the sheer amount of other ways a language can manifest ideas?

Well, first steps first. Before syntax, lexicon, and morphology comes phonetics.

In python I created a large dictionary of consonants whose keys are in hexidecimal and values are ipa characters. The keys have encoding in them, so at site you can tell what each sound does. The 64s-place digit is "manner", the 8s-place digit is "place", and the 1s place is phonation. These are the three basic dimensions of consonant sounds (there is more complexity than that, of course). The "p" sound in English is formed with both the lips (no tongue or teeth, etc), so it's considered bilabial, it is a stop consonant, which means airflow is stopped completely at some point, and your vocal cords do not vibrate. This is as opposed to the sound "z" as in "pleasure", which is a (post)alveolar sibilant voiced consonant.

The value of each digit says what the value is for each dimension. For 64's, 1 is nasal, 2 is stop, 3 is affricate (stop and fricative combined, like "t-sh" in "ch"), 4 is sibilant (sub-cat of fricative), 5 is fricative, 6 is approximant, etc. For the 8s digit, it goes from the front of the mouth to the back. 1 is bilabial, 2 is labio-dental, 3 is dental, all the way back to glottal. Phonation is either 0 (unvoiced) or 1 (voiced). I'm planning on doing more with this one.

After defining the consonants, next is randomly determining what consonants a language would have. I can not simply randomly decide from a list, for that would result in an unnatural phonetic inventory. An inventory of t̪tɖqʃðɣɽ͡rç" just isn't going to be likely, because it contains very simpliar sounds that can be very easily conused, or similar sounds with a phonation distinction pronounced in different places, or very rare sounds, or a lack of common sounds. It is very unusual to find a language without a "p" sound, or any nasals.

So what I did was create different categories of similar sounds that have a similar chance of occuring in a language. All the "t" like sounds were put together, all the nasals were put together, etc. For example,

CoronalStopList = [(0x230, 50), (0x240, 500), (0x260, 50)]

For each tuple, the first value is the index number of the sound, and the second is the weight. The sound most likely to be chosen will be 0x240 (the basic K sound as in "cat").

The chosen sounds in the category are chosen like this:

def willpick(x): y = random.randint(0,100) if y <= x: return True else: return False def chooser(sound_list, probabilities): added_here = [] for i in probabilities: if willpick(i): while True: choice = weighted_choice(sound_list) if choice in added_here: continue added_here.append(choice) break return added_here def weighted_choice(choices): # http://stackoverflow.com/questions/3679694/a-weighted-version-of-random-choice values, weights = zip(*choices) total = 0 cum_weights = [] for w in weights: total += w cum_weights.append(total) x = random.random() * total i = bisect.bisect(cum_weights, x) return values[i] CoronalStops = chooser(CoronalStopList, [99, 10, 10])

the chooser function will look at each integer in the list and that is the probability that it will choosen at that level. So for coronal stops, there is a 99% chance that one will be picked, then after that a 10% chance, then after that a 10% chance. There can at most be 3 of them. It chooses which coronal stop through the weighted choice function. There is a 5/6 chance that "t" will be chosen. But if a second one is to be chosen, "t" is taken out of the running so it has to decide between 0x230 and 0x260, both of which are 50 and therefore each have a 50% chance of being chosen at that point.

This method should create a sort of pseudo-realistic way of choosing consonants, albeit with some figiting. There are other things to be considered, like the fact that some consonants are paired. it's unlikely for a language to have /pdk/ as the only stops, because two of them are unvoiced, and one is voiced.

When I run these functions thirty times, I get the following results:

mnptksɸβθ̱χhɾl

mnbd̪dɖɡʃɸθhr

mnpt̪tkɸβθʝhʀ̥

mnptksʃɸθʝr

mɱnptʈkqsʃɕθħhɾl

mnpbtdʈɖkɡʃθθ̱χhɹ

mɲ̊pbtdkɡʃrl

mŋpbtdkɡsθχhɹl

mnpbtdkɡsʃθθ̱hɾ

ɱnbdɡɕfθhɹ

mnŋptkʃθʝhɹ

mnɳŋpbtdkɡsɸβθχ

mnptʈqsθθ̱xhɾl

mnbdɡɢɕfvθʝɹ

mɲbdɟɕɸβxħrl

m̥mnpʈkqʂθxhɹ

mŋpʈkʃfvθχhrl

mnɲpbt̪d̪kɡfvθχɹ

ɱɲpbtdkɡɕθʝhɾ

m̥mɲptckʃfvθhɾl

mnŋpbtdkɡɕɸβfvθxhr̥

mnptqsʃɸβfvθçʝɹ

mnptckɕθhʀ

mnptcsʃɸβfvθʝxhɹ

mnbdɖɢsʂɸβθθ̱xhɹl

mn̥ptcʂfvçχl

mnpbtdkɡʡsfvθhɾ

ɱnɴbdɡsfvθχhʀl

mnbdɡsfθxɹ

mnpbtdkɡsʃfvθhr̥l

Although there are multiple missing sound-gronds and many factors I need to take into consideration, it seems like a good start. I suspect that it does represent some sorts too commonly but it can e difficult to judge that, since I have little exposure to non-IE languages.

While tinkering with this, I will skip vowels and go to the next task at hand: consonant clusters, specifically, syllable onsets.

January 20, 2015 — Sean P

Yes it is

Look at this indent This is where code goes this is true

This is bold

Does it do

The double spacing

Thing too?

Will this print as just one long line?

italics

January 19, 2015 — Sean P

The rest of the text file is an html blog post. The process will continue as soon as you exit your editor.

Tags: keep-this-tag-format, tags-are-optional, example

January 02, 2012 — Sean P

January 04, 2012 — Sean P

Only 500 lines of bash. http://mmb.pcb.ub.es/~carlesfe/blog/creating-a-simple-blog-system-with-a-500-line-bash-script.htmldsa dsaf dsaf sda ds dslfsadsabd sahf dsaf dsahf adsfdbsah bdshafsdfdas dsafd safads