One of the ways IRI RowGen builds realistic test data is through the formation and population of custom field values, such as phone numbers. In this article, we explain how to use the Compound Data Value (CDV) wizard in the RowGen GUI to build a set file containing real-looking, US phone numbers based on the North American Numbering Plan (NANP).

The CDV wizard is one of several “set file” creation tools available in RowGen designed to create what is, effectively, a custom data type. It produces values by concatenating fields with user-specified string literals. The field values themselves can be literal, randomly-generated in a selected data type (within a range), or, randomly selected from a set file or DB column.

The NANP is an integrated telephone numbering scheme that encompasses 25 countries and territories, primarily in North America, the Caribbean, and US territories. The NANP number format may be summarized in the notation NPA-NXX-xxxx. Thus, we are creating our test numbers from three named components, with literal dashes “-” in between: AREACODE, PREFIX, and STATIONCODE.

AREACODE is pulled from the areacodes_us.set file that ships with RowGen.

PREFIX is a NUMERIC type field, with a SIZE of 3, and a PRECISION of 0, and uses a range “[201,999]”.

STATIONCODE is just a random DIGIT “ type field”, with a SIZE of 4.

For additional realism, we added named conditions to omit all PREFIX that end in “11”, as per the NANP.

This is the finished script created by the CDV wizard, which produced the code below and then modified to add the /CONDITIONs and /OMIT.

Steps to Create the Set File

Step 1.

Create a New Project=> From The Top Menu, Select the Rowgen Wizard and Select New Set File. Select Compound Data Values and click Next (screen below).

Step 2.

Next, you will be provided with the Compound Data Setup screen. Then put a file name in the File name field (in this case, we are using create_us_phone_numbers.rcl), and click Next.

Step 3.

Next is the Compound Data Output screen. Specify the set file name (that we need to generate) … in this case, “us_phone_numbers.set.”

Step 4.

Next, in the Compound Data Definition screen, click Add … to build fields into the Component list:

Step 5.

You have two options: Literal string or Generated value. We chose Generated value for AREACODE using the DIGIT data type, and will select that data from a Set file.

Step 6.

We have an existing set file (areacodes_us.set) to use, so click on the Finish button, and the first three digits of the phone number will be created.

Step 7.

Our Next step is to create the “-” between the area code and rest of the number. We do that by selecting the Literal screen radio button and entering a “-” in the Value Field.

Step 8.

Next, we’ll create the PREFIX section of the phone number by selecting the radio button Generated value, and defining the name of the Field, data type and Method, which is shown below.

Step 9.

Click Next for the Random Generation Attributes screen and change the size for the middle 3 digits of the phone number.

Step 10.

Click Next for the Range Selection screen. Use minimum value 201 and maximum 999 to produce a “in-line” set in the RowGen script, SET={[201,999]}. Click Finish.

Step 11.

Click the Add button on the final Compound Data Definition screen, where we create the last 4 digits of the phone number. Click Finish.

Step 12.

After adding all the fields, the final screen should look like this:

After clicking Finish, the script below is created, and when run, will generate 4000 records (INCOLLECT=4000) in a 109KB file. By changing the INCOLLECT value in the script to 10,000 and then 15,000, 40,000, and 50,000, we created 10,000 (file size:136 KB), 15,000 (file size: 205KB), 40,000 (file size: 410 KB), and 50,000 (file size:683 KB) phone numbers.

The wizard then generates this RowGen Control Language (.rcl) file automatically:

/INFILE=random /PROCESS=RANDOM /INCOLLECT=4000 /FIELD=(AREACODE, TYPE=DIGIT, SIZE=3, SET= ANY "C:/IRI/RowGen31/sets/areacodes_us.set" DEFAULT = "", POSITION=1, SEPARATOR="\t") /FIELD=(PREFIX, TYPE=NUMERIC, SIZE=3, FILL='0', PRECISION=0, SET={[201,999]}, POSITION=2, SEPARATOR="\t") /FIELD=(STATIONCODE, TYPE=DIGIT, SIZE=4, POSITION=3, SEPARATOR="\t") /CONDITION=(X211, TEST=(PREFIX EQ 211)) /CONDITION=(X311, TEST=(PREFIX EQ 311)) /CONDITION=(X411, TEST=(PREFIX EQ 411)) /CONDITION=(X511, TEST=(PREFIX EQ 511)) /CONDITION=(X611, TEST=(PREFIX EQ 611)) /CONDITION=(X711, TEST=(PREFIX EQ 711)) /CONDITION=(X811, TEST=(PREFIX EQ 811)) /CONDITION=(X911, TEST=(PREFIX EQ 911)) /CONDITION=(XPREFIX, TEST=(X211 OR X311 OR X411 OR X511 OR X611 OR X711 OR X811 OR X911)) /OMIT WHERE XPREFIX /REPORT /OUTFILE=us-phone-numbers.set /PROCESS=RECORD /FIELD=(PHONE_NUMBER=format_strings("%s-%s-%s", AREACODE, PREFIX, STATIONCODE), TYPE=ASCII, POSITION=1, SEPARATOR="\t")

Run the Data Generation

Execute the .rcl file (RowGen job script) from the GUI’s Run options or on the command line (rowgen /spec=scriptname.rcl).

When run the output file contains values like these:

954-642-3959 605-956-7516 609-555-5274 234-804-6362 509-926-1103 919-659-6360 253-213-4263 319-982-0523

Follow are screen captures of how an omit condition can be applied to remove specific area codes from the output data.

Figure 1.

Figure 2.

Figure 3.

Here is the final representation of the .rcl code in the IRI Workbench GUI for RowGen, built on Eclipse™:

Please feedback on this article below, or contact rowgen@iri.com if you need help generating custom test data.