Apache Sqoop Driver
Apache Sqoop uses the JDBC driver to connect to databases and perform required operations. It also facilitates the connection in form of code paths for those databases which are not SQL compliant. Sqoop is compatible with a large number of databases. To make the connection between Sqoop to different databases we need to download it and install it separately.
By using JDBC common API, we can connect with different databases. The JDBC drivers from different databases can be installed on the client system under the $SQOOP_HOME/lib path. Sqoop uses the JARs present on this path $SQOOP_HOME/lib and loads the classes to run MapReduce jobs. Sqoop uses a protocol scheme to check the vendor logic If the database is identified by Sqoop then Sqoop works otherwise we may need to specify the driver class to load via --driver. Sqoop also provides the --direst parameter to support non-JDBC-based access.
Apache Sqoop includes vendor-specific support for the below databases.
Database | Version | Support (--direct)? | Connect String | HSQLDB | 1.8.0+ | No | jdbc:hsqldb:*// | MySQL | 5.0+ | Yes | jdbc:mysql:// | Oracle | 10.2.0+ | No | jdbc:oracle:*// | PostgreSQL | 8.3+ | Yes (import only) | jdbc:postgresql:// | CUBRID | 9.2+ | NO | jdbc:cubrid:* |
---|
Apache Sqoop Connectors
Apache Sqoop supports the connector based architecture with which external systems can connect with Apache Sqoop. Using connectors Sqoop can connect with the data sources that have no JDBC support. We can add connector and plugin during the installation of Apache Sqoop. For installation, we can copy the JDBC driver under the $SQOOP_HOME/lib path of Sqoop in
Let us see how connectors partition, format their output, extract data and load data.
1. Partitioner
In this phase, the partitioner generates conditions that can be used by the extractor.
(upper boundary - lower boundary) / (max partitions)
If there is no specification from the user end then a primary key will be used to partition the data.
The data types supported are mentioned below.
- TINYINT
- SMALLINT
- INTEGER
- BIGINT
- REAL
- FLOAT
- DOUBLE
- NUMERIC
- DECIMAL
- BIT
- BOOLEAN
- DATE
- TIME
- TIMESTAMP
- CHAR
- VARCHAR
- LONGVARCHAR
2. Extractor
In this phase, the JDBC data source is queried using SQL which is based on your configuration.
- If a user provides the table name then the SQL query is generated which will be like this "Select * from tab_name".
- If a user provides the table name and the column name then the SQL query is generated which will be like this "Select col_name from tab_name".
- If a user provides the table SQL then that SQL will be executed.
3. Loader
In this phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.
- IN case a user has provided a table name then the SQL will be generated as "INSERT INTO tab_name (col_name1, col2_name, ...) VALUES ()".
- IN case a user has provided a table name and column name then the SQL will be generated as "INSERT INTO table_name (columns) VALUES (??,..)".
- In case a user has provided a SQL query then that SQL query will be used.
4. Destroyers
In this phase, below two operations are performed.
- Perform the copy operation of the staging table to the concerned table.
- Once the copy operation is completed then empty the staging table.