Sqoop has become a popular tool among Big data developers used to fetch relational data from the RDBMS. Since the time when Hive, HBase, Cassandra, Pig, and MapReduce came into existence, developers felt the need of having a tool that can interact with RDBMS server to import and export the data.

Sqoop means “SQL to Hadoop and Hadoop to SQL”. The tool is designed to transfer data between relational database servers and Hadoop. This post is written to provide you an introduction to Sqoop and the way in which it imports and export data.

To import data from relational database servers and Hadoop Sqoop tool is used. To import data from relational databases like Oracle, MySQL, HDFS or Hadoop file system to relational databases developers can use Sqoop tool.

Sqoop- At a Glance

Usually, applications interact with databases by the RDBMS and so RDBMS generate a huge amount of data and is considered as a great data source. In Hadoop Ecosystem, Sqoop plays an important role. As told above that the tool is used to transfer data from Hadoop to RDBMS and vice versa. Even bulk data can also be transferred easily between Hadoop and external data sources like relational databases, data warehouses, etc.

The user can also import data from external data sources like HBase or Hive. Sqoop import and export operations that executed through commands and described in the following sections of this blog post.

Working of Sqoop Import and Export Command

The import tool is used by Sqoop to import data from RDBMS to HDFS and in Sqoop each row is treated as a record. Each task that is assigned to Sqoop is divided into subtasks that are handled by the individual Map Task internally. Map task is just a subtask that imports data to the Hadoop Ecosystem and here all Map tasks import all the data. Sqoop exports command also work in a similar manner. Sqoop export tool exports a set of files from HDFS to the RDBMS, the input files of Sqoop contains records that are also called the rows of a table.

Sqoop Cheat Sheet Command

In Sqoop, there is a list of commands available for each and every task or subtask. Here, in the cheat sheet, we are going to discuss the commonly used cheat sheet commands in Sqoop. The commands are used for the following purposes:

Commands to Transfer Entire Tables

To transfer entire table from source to destination, we can use the following command:

Sqoop import\--connect jdbc:mysql://mysql.example.com/sqoop\ username sqoop\password sqoop\table regions

Commands to specify a Target Directory

To specify custom output directories basically you can use two parameters:

Directories: -target-dir and –warehouse-dir. -target-dir option imports the real data to the specified directory. Above listed two commands can be used in the following ways: Sqoop import \ connect JDBC:Mysql://mysql.example.com/sqoop\usernae sqoop\password sqoop\ table regions\target-dir/etc/input/cities -warehouse-dir option creates a table name directory and data is imported in that directory, like shown below: Sqoop import\-connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\warehouse-dir/etl/input/

Commands to Import Data Subset

We can specify more than one condition but without using any aggregated query we can perform the operation:

Sqoop import\connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\ where “country=’USA’ ”

Command to Protect the Password:

Following command can read the password from the standard input, even we can also read a command from a file. The commands for these tasks are:

Sqoop import\connect JDBC: Mysql://mysql.example.com/sqoop\username sqoop\table regions\ -P To read a password from a file: Sqoop import\connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\ password-file my-sqoop-password

Command to use another file format except for CSV

If you want to import a file from any file other than a CSV file then the following command should be used by you:

Sqoop import\connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\as-sequence file

Commands to Import Compressed Data

To import data into the compressed format we can use following commands. Apart from this, the compression format of data can also be changed for this purpose another set of command is used that is also listed below:

Sqoop import\connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\compress To change compression format: Sqoop import –cmpress\--compress-codec org.apache.hadoop.io.compress.BZip2Codec

To Import All Tables

If you have to import more than one table to your database then the following command can be used:

Sqoop import-all-tables\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\password sqoop\table cities\ OR Sqoop import-all-tables\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\table cities\ --exclude-tables cities,countries

To Transfer Data from Hadoop

In order to transfer data from the database to Hadoop you may have to use more than one set of commands that are listed below:

Sqoop export\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --export-dir regions Sqoop export\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --update-key id Sqoop export\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --update-key id\--update-mode alloy insert

Command to Import data directly to Hive

If you want to import data directly to Hive tool then use following import command

Sqoop import\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --hive-import

Command to Import Data to HBase

To import data to HBase rather than Hive you can use the following set of commands:

Sqoop import\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --HBase-table regions\--column-family world

In this way, we can use the number of commands with Sqoop and the list is not limited instead it is quite long. Some more Sqoop commands are listed below:

Sqoop-Incremental Import Command

There is an option in Sqoop to use import command in an incremental manner the imported rows are newer than previously imported rows. The two types of incremental support that are available in Sqoop are:

Last modified Append

If you want to add the newly imported rows in continuation with the previous ones and the row-id is increased in an incremental manner. In this command, you can also specify the row id with –check-column. Only those values will be imported that have the values greater than the specified value.

An alternate way to import data that is specified by Sqoop is last modified mode. This may be used when the rows of source table can be updated. This command will set the value of last modified column to the current timestamp.

Increment import command is given below:

Sqoop import –connect jdbc:mysql://localhost/regions –username sqoop – table regions –target-dir/Latest Regions –incremental append –check-column region id –last-value 544444

Sqoop Command to Enlist Databases

You can check and enlist the databases that are present in Sqoop. For this Sqoop list-databases tool uses SHOW DATABASES query to parse and execute the command against database server. The command is listed below:

Sqoop list-databases\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop

Sqoop Command to Enlist Tables

You can use following query to check the tables that are stored in the database server. For this purpose, Sqoop uses a specific command that is SHOW TABLES. Following command is used to enlist the tables:

Sqoop list-tables\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop

Sqoop Command to Import a Single Table from Sqoop

If you want to import just a single table from the database then you can use the below-listed command for that:

Sqoop list-databases\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop/--password sqoop/table regions –target –dir/user/sqoop/regions

Here, in this command, you will have to specify: -table: Name of the source table -target-dir: location where you will copy the data of table

Sqoop Command to Encode NULL Values

If you want to encode the null value than you can use the following command:

Sqoop import\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop\ -password sqoop\--table cities\--null-string’\\N’ \ --null-non-string’\\N’

Sqoop Command to Import Mutable Data in Incremental Manner

Sqoop import\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\table regions\incremental append\--check-column id\--last-value20 Sqoop import\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\table regions\incremental lastmodified\--check-column last_update_date\--last-value “2018-06-11 00:01:01”

Sqoop is being used for data transfer between data source and destination and it offers many advantages to the user. A number of features that are present in Sqoop make it popular. Above listed commands are not limited instead there are a number of commands that can provide many operations that are necessary for data transfer. Due to above-listed commands following listed features have become important and useful among developers. Big data users use Sqoop due to these features that are beneficial for them:

Full Load : You can either load full table or any specific column into databases and for this, you will have to use only a command.

: You can either load full table or any specific column into databases and for this, you will have to use only a command. Incremental Load : If you don’t want to load entire table all at once than can do it in an incremental manner, again for this you will have to use just a single command.

: If you don’t want to load entire table all at once than can do it in an incremental manner, again for this you will have to use just a single command. Parallel import/export: YARN framework is used by Sqoop to import, export and fault tolerance are also available due to this.

Final Words:

We can say that if we want to execute the data transfer operations smoothly then we should use Sqoop. It is quite popular among Big Data developers just because it can provide a number of beneficial features and just through a single command, you can perform many tasks or sub-tasks. For Hive or HBase you can also do the same operations conveniently. For hands-on expertise on all Sqoop cheat sheet commands, you should join Hadoop certification program at JanBask Training right away.



