When writing tests, programmers often need to provide some test files for their code to work. This is typically done by uploading them to their version control systems or exposing them over the network to be downloaded at runtime. However the reasons for a particular test file being used may differ greatly. Usually there are these main reasons to include test files in your automated testing process:

Configuration

Data transfer

Test data

However, we don’t always need data, that are logically structured or meaningful. Recently, I have been working on a project that involved development of an enterprise grade archiving system. We were faced with a situation how to solve the problem of providing a flexible number of test files for our unit and integration tests. These files were to be submitted via REST API and the tests validated system-wide processes and their footprint. To accomplish this we tried to search for a randomly generated streams in all well-known libraries but found nothing. This got us thinking and we decided to implement following stream to generate data for us with a little twist.

As you can see, our RandomGeneratedInputStream s constructor takes either one or three arguments. One argument constructor creates truly randomly generated data for test files. However the three argument constructor provides programmer the means to select strategy used to generate the stream content and also control its degree of randomness. Thanks to this little tweak we are able to generate not only random test files, but also files that, when compressed to ZIP archive for example, become really efficient to transport over the network – given their compressed size and allow for an even load to a compression handler. See the following snippet for details on how to use this stream and also how to configure it properly:

/** * Input stream that generates random data with three possible configurations: * <ul> * <li>Fully random stream created by calling one argument constructor.</li> * <li>Stream that starts with fixed uniform block of bytes followed by fully random data.</li> * <li>Stream that creates fixed blocks and populates each with randomly generated new byte.</li> * </ul> * * The first option is quite universal and may be used for any type of tests. The second option might be * useful when testing logic dependent on compressed data size and archived data are transferred via * network. However, the second configuration is not efficient in terms of complexity of compression and * its speed. In order to remove this bias the third configuration was introduced to provide evenly * distributed data that provide steady load for compression and decompression of generated files. * * @author Petr Fiala, Jakub Stas */ public class RandomGeneratedInputStream extends InputStream { private final Random random = new Random(); /** Type of randomization strategy. */ private final Type type; /** Target size of the stream. */ private final long size; /** Size of the block populated by same byte. */ private final long blockSize; /** Size of expandable block being populated by same byte. */ private long currentBlockSize; /** Value of last generated byte. */ private int lastUsedByte; /** Internal counter. */ private long index; /** * @param size target size of the stream [byte] */ public RandomGeneratedInputStream(long size) { this(size, 1, Type.FIXED); } /** * @param size target size of the stream [byte] * @param blockSize size of the block populated by same byte [byte] * @param type randomization strategy */ public RandomGeneratedInputStream(long size, long blockSize, Type type) { super(); if (blockSize < 1) { throw new IllegalArgumentException("Block size must be at least one byte!"); } this.size = size; this.type = type; this.blockSize = blockSize; this.currentBlockSize = blockSize; this.lastUsedByte = random.nextInt(255); } @Override public int read() throws IOException { if (index == size) { return -1; } switch (type) { case ITERATIVE: if (index == currentBlockSize) { lastUsedByte = random.nextInt(255); currentBlockSize += blockSize; } break; case FIXED: if (index >= blockSize) { lastUsedByte = random.nextInt(255); } break; default: break; } index++; return lastUsedByte; } /** * Type of randomization strategy used to populate the stream. */ public static enum Type { ITERATIVE, FIXED; } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 /** * Input stream that generates random data with three possible configurations: * <ul> * <li>Fully random stream created by calling one argument constructor.</li> * <li>Stream that starts with fixed uniform block of bytes followed by fully random data.</li> * <li>Stream that creates fixed blocks and populates each with randomly generated new byte.</li> * </ul> * * The first option is quite universal and may be used for any type of tests. The second option might be * useful when testing logic dependent on compressed data size and archived data are transferred via * network. However, the second configuration is not efficient in terms of complexity of compression and * its speed. In order to remove this bias the third configuration was introduced to provide evenly * distributed data that provide steady load for compression and decompression of generated files. * * @author Petr Fiala, Jakub Stas */ public class RandomGeneratedInputStream extends InputStream { private final Random random = new Random ( ) ; /** Type of randomization strategy. */ private final Type type ; /** Target size of the stream. */ private final long size ; /** Size of the block populated by same byte. */ private final long blockSize ; /** Size of expandable block being populated by same byte. */ private long currentBlockSize ; /** Value of last generated byte. */ private int lastUsedByte ; /** Internal counter. */ private long index ; /** * @param size target size of the stream [byte] */ public RandomGeneratedInputStream ( long size ) { this ( size , 1 , Type . FIXED ) ; } /** * @param size target size of the stream [byte] * @param blockSize size of the block populated by same byte [byte] * @param type randomization strategy */ public RandomGeneratedInputStream ( long size , long blockSize , Type type ) { super ( ) ; if ( blockSize < 1 ) { throw new IllegalArgumentException ( "Block size must be at least one byte!" ) ; } this . size = size ; this . type = type ; this . blockSize = blockSize ; this . currentBlockSize = blockSize ; this . lastUsedByte = random . nextInt ( 255 ) ; } @Override public int read ( ) throws IOException { if ( index == size ) { return - 1 ; } switch ( type ) { case ITERATIVE : if ( index == currentBlockSize ) { lastUsedByte = random . nextInt ( 255 ) ; currentBlockSize += blockSize ; } break ; case FIXED : if ( index >= blockSize ) { lastUsedByte = random . nextInt ( 255 ) ; } break ; default : break ; } index ++ ; return lastUsedByte ; } /** * Type of randomization strategy used to populate the stream. */ public static enum Type { ITERATIVE , FIXED ; } }

To back my claim I wrote two simple unit tests that use RandomGeneratedInputStream to create 20 and 25 test files of size 20 480 bytes. Code afterwards archives these files into ZIPs that can be injected into the relevant entities and used for testing of the business logic. I offer you these simple tables containing sizes and compression ratios of a single run of my tests. As expected, for FIXED strategy all sizes as well as compression ratios scale pretty evenly. Interestingly enough, the second table containing results of ITERATIVE strategy provides insights for ratios that behave differently from previous case. Both tests were executed under JDK 1.7.0_21 and compression was done using default NIO.2 mechanism for creation of ZIP archives (to be described in NIO.2 series soon).

Results of testing: FIXED strategy Stream size Block size Archive size Compression ratio 20 480 b 20 480 b 183 b 99.11 % 20 480 b 19 456 b 1 297 b 93.67 % 20 480 b 18 432 b 2 318 b 88.68 % 20 480 b 17 408 b 3 345 b 83.67 % 20 480 b 16 384 b 4 363 b 78.70 % 20 480 b 15 360 b 5 385 b 73.71 % 20 480 b 14 336 b 6 399 b 68.75 % 20 480 b 13 312 b 7 423 b 63.75 % 20 480 b 12 288 b 8 442 b 58.78 % 20 480 b 11 264 b 9 461 b 53.80 % 20 480 b 10 240 b 10 476 b 48.85 % 20 480 b 9 216 b 11 494 b 43.88 % 20 480 b 8 192 b 12 517 b 38.88 % 20 480 b 7 168 b 13 535 b 33.91 % 20 480 b 6 144 b 14 554 b 28.94 % 20 480 b 5 120 b 15 570 b 23.97 % 20 480 b 4 096 b 16 592 b 18.98 % 20 480 b 3 072 b 17 613 b 14.00 % 20 480 b 2 048 b 18 631 b 9.03 % 20 480 b 1 024 b 19 647 b 4.07 % 20 480 b 0 b 20 640 b -0.78 %