Math fun | Checking scripts > #permalink And here comes random data generators I just checked in data.random , a collection of random data generators and their combinators. The names of API functions are not yet fixed, but I think the overall it's in a good shape. (Since 0.9.4 is overdue, I might be going to release it without making data.random official. I'm not sure yet.) Here's the code: http://gauche.git.sourceforge.net/git/gitweb.cgi?p=gauche/Gauche;a=blob;f=lib/data/random.scm;hb=HEAD It provides a bunch of primitive random generators such as followings. uniform distribution (integer size :optional (start 0)) returns a generator that produces random integer between start and start+size-1, uniformly. (integer-between lo hi) returns a generator that produces random integer between lo and hi (both inclusive). int8 , uint8 etc. are preset generators to produce the range their name suggest. (char :optional cset) returns a generator of random characters from a character set. When omitted, we use #[A-Za-z0-9] as the default character set. We also have boolean , real , real-between . We want to have exact rational generators and complex generators, but I wonder how the range and distribution should be specified.

nonuniform distribution For discrete sampling, we have geometric and poisson distribution. For continuous sampling, we have normal and exponential distribution.

Then, those generators can be combined to make more complex generators. random choice (one-of generators) returns a generator that picks one generator in generators randomly to produce the next value. (weighted-sample weight&generators) allows you to specify weight of selection probability for each generators.

aggregate data (pair-of gen1 gen2) , (tuple-of gen ...) list-of , vector-of , string-of - these combinators can be called in two different forms, e.g. (list-of sizer item-gen) : sizer can be an integer, or an integer-generator, to give the length of the resulting list. item-gen is a generator to produce elements. (list-of item-gen) : If sizer is omitted, we use some default generator to determine the length of the resulting list. Currently I use (poisson 4) provisionally.

I also have permutation-of and combination-of , which takes a list of items (not item generators). What I like about the current shape is that those generators can be combined using gauche.generator framework as well; e.g. you can have series of sum of two dice rolling by: (gmap + (integer-between 1 6) (integer-between 1 6)) or apply a filter: (gfilter (cut < 0 <> 1) (exponential 1)) or taking some values into a list: (generator->list (poisson 5) 10) Here are some elements about API I'm still pondering about: We have procedures that creates a generator (e.g. integer , real , char ) and pre-created generators (e.g. fixnum , int8 ). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers?

, , ) and pre-created generators (e.g. , ). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers? There's an idea rolling in my head to provide plural names as an alias, e.g. chars for char . It plays nicely with the combinators, e.g. (list-of fixnums) or (string-of 5 (chars)) . But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally.

for . It plays nicely with the combinators, e.g. or . But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally. The handling of omitted argument of list-of etc. is also different from Gauche's convention of optional arugments. If you have data generator ideas to be thrown in to this module, let me know. Now I'm writing a generative test framework, using this module as a data generators. Tags: data.random, Generators