Reading Time: 2 minutes

Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system.

To start with it :

1) we first need to include the (sbt) dependencies (for an sbt project) :

libraryDependencies ++= Seq( "org.apache.hadoop" % "hadoop-common" % "2.8.0", "org.apache.hadoop" % "hadoop-hdfs" % "2.8.0" )

2) Next step is to configure for the filesystem :

/** * This method configures the file system * @param coreSitePath Path to core-site.xml in hadoop * @param hdfsSitePath Path to hdfs-site.xml in hadoop * @return HadoopFileSystem instance */ public FileSystem configureFilesystem(String coreSitePath, String hdfsSitePath) { FileSystem fileSystem = null; try { Configuration conf = new Configuration(); Path hdfsCoreSitePath = new Path(coreSitePath); Path hdfsHDFSSitePath = new Path(hdfsSitePath); conf.addResource(hdfsCoreSitePath); conf.addResource(hdfsHDFSSitePath); fileSystem = FileSystem.get(conf); return fileSystem; } catch (Exception ex) { System.out.println("Error occurred while Configuring Filesystem "); ex.printStackTrace(); return fileSystem; } }

3) After configuring filesystem we are ready to start reading from HDFS or write to HDFS:

Let us start by writing something to HDFS from local filesystem : To perform this operation we will use

“void copyFromLocalFilesystem( Path src, Path dst )”

method of filesystem api.

/** * * @param fileSystem refers to Hadoop FileSystem instance * @param sourcePath provides the sample input file which can be written to HDFS * @param destinationPath refers to path on hdfs where the sample input file will be written * @return */ public String writeToHDFS(FileSystem fileSystem, String sourcePath, String destinationPath) { try { Path inputPath = new Path(sourcePath); Path outputPath = new Path(destinationPath); fileSystem.copyFromLocalFile(inputPath, outputPath); return Constants.SUCCESS; } catch (IOException ex) { System.out.println("Some exception occurred while writing file to hdfs"); return Constants.FAILURE; } }

Next we can read from HDFS and store to our local file system : To perform this operation we can use

“void copyToLocalFile( Path src, Path dst )”

method of filesystem api.

/** * * @param fileSystem refers to Hadoop FileSystem instance * @param hdfsStorePath refers to path on hdfs where the sample input file is present * @param localSystemPath refers to a location of file on local system in which data read from hadoop file will be written * @return */ public String readFileFromHdfs(FileSystem fileSystem, String hdfsStorePath, String localSystemPath) { try { Path hdfsPath = new Path(hdfsStorePath); Path localPath = new Path(localSystemPath); fileSystem.copyToLocalFile(hdfsPath, localPath); return Constants.SUCCESS; } catch (IOException ex) { System.out.println("Some exception occurred while reading file from hdfs"); return Constants.FAILURE; } }

4) Final step is to close the filesystem after we are done reading from HDFS or writing to HDFS :

/** * This closes the FileSystem instance * @param fileSystem */ public void closeFileSystem(FileSystem fileSystem) { try { fileSystem.close(); } catch (Exception ex) { System.out.println("Unable to close Hadoop filesystem : " + ex); } }

References :

1) https://hadoop.apache.org/docs/r2.7.1/api/index.html?org/apache/hadoop/fs/FileSystem.html