This blog will show you how to pull selected columns from a CSV file containing IP geolocation data and save them into a second CSV file using our Data Pipeline Java library. As part of the transformation, you’ll also have the option to rearrange the order of the resulting columns.

IP Geolocation CSV file

The code reads an input CSV file created from the free IP geolocation database at http://dev.maxmind.com/geoip/legacy/geolite/.

1.0.0.0,1.0.0.255,16777216,16777471,AU,Australia 1.0.1.0,1.0.3.255,16777472,16778239,CN,China 1.0.4.0,1.0.7.255,16778240,16779263,AU,Australia 1.0.8.0,1.0.15.255,16779264,16781311,CN,China 1.0.16.0,1.0.31.255,16781312,16785407,JP,Japan 1.0.32.0,1.0.63.255,16785408,16793599,CN,China 1.0.64.0,1.0.127.255,16793600,16809983,JP,Japan 1.0.128.0,1.0.255.255,16809984,16842751,TH,Thailand 1 2 3 4 5 6 7 8 1.0.0.0 , 1.0.0.255 , 16777216 , 16777471 , AU , Australia 1.0.1.0 , 1.0.3.255 , 16777472 , 16778239 , CN , China 1.0.4.0 , 1.0.7.255 , 16778240 , 16779263 , AU , Australia 1.0.8.0 , 1.0.15.255 , 16779264 , 16781311 , CN , China 1.0.16.0 , 1.0.31.255 , 16781312 , 16785407 , JP , Japan 1.0.32.0 , 1.0.63.255 , 16785408 , 16793599 , CN , China 1.0.64.0 , 1.0.127.255 , 16793600 , 16809983 , JP , Japan 1.0.128.0 , 1.0.255.255 , 16809984 , 16842751 , TH , Thailand

Column Extraction Code

The Java code below creates a new CSV file containing the last two columns from the input CSV file.

package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataEndpoint; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.csv.CSVWriter; import com.northconcepts.datapipeline.job.JobTemplate; import com.northconcepts.datapipeline.transform.IncludeFields; import com.northconcepts.datapipeline.transform.TransformingReader; public class SelectivelyTransferCsvFile2 { public static void main(String[] args) { //Read the input CSV file DataReader reader = new CSVReader(new File( "example/data/input/GeoliteDataInput.csv")) .setFieldNamesInFirstRow(false); //Add a transforming reader to only include the necessary fields TransformingReader transformingReader = new TransformingReader(reader); transformingReader.add(new IncludeFields("E","F")); //Create output CSV file DataWriter writer = new CSVWriter(new File("example/data/output/GeoliteDataOutput.csv")) .setFieldNamesInFirstRow(false); //Transfer data from the input to output file JobTemplate.DEFAULT.transfer(transformingReader, writer); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 package com . northconcepts . datapipeline . examples . cookbook ; import java . io . File ; import com . northconcepts . datapipeline . core . DataEndpoint ; import com . northconcepts . datapipeline . core . DataReader ; import com . northconcepts . datapipeline . core . DataWriter ; import com . northconcepts . datapipeline . csv . CSVReader ; import com . northconcepts . datapipeline . csv . CSVWriter ; import com . northconcepts . datapipeline . job . JobTemplate ; import com . northconcepts . datapipeline . transform . IncludeFields ; import com . northconcepts . datapipeline . transform . TransformingReader ; public class SelectivelyTransferCsvFile2 { public static void main ( String [ ] args ) { //Read the input CSV file DataReader reader = new CSVReader ( new File ( "example/data/input/GeoliteDataInput.csv" ) ) . setFieldNamesInFirstRow ( false ) ; //Add a transforming reader to only include the necessary fields TransformingReader transformingReader = new TransformingReader ( reader ) ; transformingReader . add ( new IncludeFields ( "E" , "F" ) ) ; //Create output CSV file DataWriter writer = new CSVWriter ( new File ( "example/data/output/GeoliteDataOutput.csv" ) ) . setFieldNamesInFirstRow ( false ) ; //Transfer data from the input to output file JobTemplate . DEFAULT . transfer ( transformingReader , writer ) ; } }

CSVReader

A new CSVReader is created to parse the GeoliteDataInput.csv input file. The setFieldNamesInFirstRow(false) method call causes CSVReader to create synthetic names for each column starting with “A”, “B”, “C”, etc. Similar to MS Excel’s column naming. If true had been passed in, then the actual column names in the file would have been used.



DataReader reader = new CSVReader(new File("example/data/input/GeoliteDataInput.csv")) .setFieldNamesInFirstRow(false); 1 2 DataReader reader = new CSVReader ( new File ( "example/data/input/GeoliteDataInput.csv" ) ) . setFieldNamesInFirstRow ( false ) ;

Select Fields Data Transformation

A TransformingReader is wrapped around the initial CSVReader to allow the data to be manipulated on-the-fly. The actual modifications are specified by adding Transformer subclasses — IncludeFields in this case — to the TransformingReader .

TransformingReader transformingReader = new TransformingReader(reader); transformingReader.add(new IncludeFields("E","F")); 1 2 TransformingReader transformingReader = new TransformingReader ( reader ) ; transformingReader . add ( new IncludeFields ( "E" , "F" ) ) ;

The IncludeFields class specifies which columns from the input CSV file need to be included in the output CSV file. All other columns are discarded from this point forward. IncludeFields also arranges the columns in the specified order. Passing in “F”, then “E” would have swapped the country name and country code columns.

ExcludeFields

An alternative to selecting the fields to retain with IncludeFields would be to specify the fields to drop using ExcludeFields. Unlike IncludeFields , this transformer cannot rearrange columns. Deciding which to use usually comes down to which is easier — whitelisting or blacklisting — and whether or not columns need to be rearranged.

CSVWriter

A CSVWriter is created to generate the new GeoliteDataOutput.csv output file. The setFieldNamesInFirstRow(false) method is called in this case to to prevent CSVWriter from saving the synthetic field names added during the read step.

DataWriter writer = new CSVWriter(new File("example/data/output/GeoliteDataOutput.csv")) .setFieldNamesInFirstRow(false); 1 2 DataWriter writer = new CSVWriter ( new File ( "example/data/output/GeoliteDataOutput.csv" ) ) . setFieldNamesInFirstRow ( false ) ;

Running the Transfer Job

The actual data is transfered from the reader to writer using the default implementation of JobTemplate.

JobTemplate.DEFAULT.transfer(transformingReader, writer); 1 JobTemplate . DEFAULT . transfer ( transformingReader , writer ) ;

Output CSV file

AU,Australia CN,China AU,Australia CN,China JP,Japan CN,China JP,Japan TH,Thailand 1 2 3 4 5 6 7 8 AU , Australia CN , China AU , Australia CN , China JP , Japan CN , China JP , Japan TH , Thailand

Download Data Pipeline

You can get started with Data Pipeline by downloading it and reviewing the getting started page.

You can also view additional examples by visiting https://northconcepts.com/data-pipeline/examples/.