Apache Hive JOINs are used to combine columns from one (self-join) or more tables by using values common to each. Using join we can fetch corresponding records from two or more tables. It is almost similar to SQL joins.
Apache Hive provides four types of joins which are mentioned below.
The following graph is the representation of Apache Hive Joins using table A and table B.
Hive Join Syntax
table_reference [INNER] JOIN table_factor [join_condition]
| table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition
| table_reference LEFT SEMI JOIN table_reference join_condition
| table_reference CROSS JOIN table_reference [join_condition] (as of Hive 0.10)
table_reference:
table_factor
| join_table
table_factor:
tbl_name [alias]
| table_subquery alias
| ( table_references )
join_condition:
ON expression
Hive Join Example
For this example, we have created two tables, the first table name is “cloudduggudb.employees_data” and the second table name is “cloudduggudb.department_data” and their create statements and insert statements are mentioned below.
Create Table Statement:
Create a statement for table “cloudduggudb.employees_data”.
Insert Table Statement:
Insert statement for table “cloudduggudb.employees_data”.
Create Table Statement:
Create a statement for table “cloudduggudb.department_data”.
Insert Table Statement:
Insert statement for table “cloudduggudb.department_data”.
Let us see each Join condition in detail.
1. Inner Join
In an inner job, there should a common column present in two tables on which we can join. In this type of join rows from the first tables are checked with the rows of the second table to fetch the joined based output.
Inner Join Statement:
SELECT e.emp_id, e.emp_name, e.salary, d.dep_name
FROM cloudduggudb.employees_data e JOIN cloudduggudb.department_data d
ON (e.dep_id = d.dep_id);
Command Output:
2. Left Outer Join
FROM cloudduggudb.employees_data e JOIN cloudduggudb.department_data d
ON (e.dep_id = d.dep_id);
2. Left Outer Join
In the Left outer join, the values will be returned from the left table that is not matching with the right table that will include NULL values as well.
Left Outer Join Statement:
SELECT e.emp_id, e.emp_name, e.salary, d.dep_name
FROM cloudduggudb.employees_data e LEFT OUTER JOIN cloudduggudb.department_data d
ON (e.dep_id = d.dep_id);
Command Output:
3. Right Outer Join
FROM cloudduggudb.employees_data e LEFT OUTER JOIN cloudduggudb.department_data d
ON (e.dep_id = d.dep_id);
3. Right Outer Join
In the Right outer join, the values will be returned from the right table that is not matching with the left table which will include NULL values as well.
Right Outer Join Statement:
SELECT e.emp_id, e.emp_name, e.salary, d.dep_name
FROM cloudduggudb.employees_data e RIGHT OUTER JOIN cloudduggudb.department_data d
ON (e.dep_id = d.dep_id);
Command Output:
4. Full Outer Join
FROM cloudduggudb.employees_data e RIGHT OUTER JOIN cloudduggudb.department_data d
ON (e.dep_id = d.dep_id);
4. Full Outer Join
In Full outer join, the matching rows will be returned and the rows which are not matching from both tables will also be returned as the null value.
Full Outer Join Statement:
SELECT e.emp_id, e.emp_name, e.salary, d.dep_name
FROM cloudduggudb.employees_data e FULL OUTER JOIN cloudduggudb.department_data d
ON (e.dep_id = d.dep_id);
Command Output:
FROM cloudduggudb.employees_data e FULL OUTER JOIN cloudduggudb.department_data d
ON (e.dep_id = d.dep_id);