The data type is an attribute of data that is used to define the type of value a column stores in the Hive table. Hive provides various data types such as Numeric, Date/time. String and so on.
Apache Hive provides the following list of Datatypes.
Let us see each data type in detail.
1. Numeric Types
Apache Hive provides the below set of Date/Time data type.
DataType | Description | TINYINT | It is 1-byte signed integer, range from -128 to 127. | SMALLINT | It is 2-byte signed integer, range from -32,768 to 32,767. | INT/INTEGER | It is 4-byte signed integer, range from -2,147,483,648 to 2,147,483,647. | BIGINT | It is 8-byte signed integer, range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. | FLOAT | It is 4-byte single-precision floating-point number. | DOUBLE | It is an 8-byte double-precision floating-point number. | DOUBLE PRECISION | It is an alias for DOUBLE, only available starting with Hive 2.2.0. | DECIMAL | It was Introduced in Hive 0.11.0 with a precision of 38 digits. | NUMERIC | It is the same as DECIMAL, starting with Hive 3.0.0. |
---|
2. Date/Time Types
Apache Hive provides the below set of Date/Time data type.
2.1 Timestamps
Timestamps support traditional UNIX timestamps with optional nanosecond precision.For text files Timestamps supports yyyy-mm-dd hh:mm:ss[.f...] format.It was introduced in Hive 0.8.0.
Timestamps Supported Conversions:
Integer numeric types: It is interpreted as a UNIX timestamp in seconds.
Floating-point numeric types: It is interpreted as a UNIX timestamp in seconds with decimal precision.
Strings: It is JDBC compliant java.sql.Timestamp format "YYYY-MM-DD HH:MM: SS.fffffffff" (9 decimal place precision).
2.2 Dates
Integer numeric types: It is interpreted as a UNIX timestamp in seconds.
Floating-point numeric types: It is interpreted as a UNIX timestamp in seconds with decimal precision.
Strings: It is JDBC compliant java.sql.Timestamp format "YYYY-MM-DD HH:MM: SS.fffffffff" (9 decimal place precision).
Strings: It is JDBC compliant java.sql.Timestamp format "YYYY-MM-DD HH:MM: SS.fffffffff" (9 decimal place precision).
The DATE datatype of Hive represents the date in the format of year/month/day(yyyy-mm-dd). It won't have time for the day component. The Date type range value is between 0000-01-01 to 9999-12-31.
2.3 Interval
Interval data type can be used by specifying Intervals of time units such as SECOND / MINUTE / DAY / MONTH / YEAR. It was introduced in Hive 1.2.0.
3. String Types
Apache Hive provides the below list of String data types.
3.1 Strings
The string literals are enclosed with either single quotes or double;e quotes in Apache Hive.
3.2 Varchar
The varchar data type is in the range of 1 and 65535 that defines the max character string allowed for a string.
3.3 Char
The Char data type is identical to Varchar but it has a fixed length which means if a value is taking less space than the defined length then space will be added. The max length supported by char is 255.
4. Miscellaneous Types
ApacheHive provides the below list of Miscellaneous data types.
4.1 Boolean
The boolean data type is either True or False. It is similar to Java's Boolean.
4.2 Binary
The binary data type is an array of Bytes. It is similar to VARBINARY in many RDBMS.
5. Complex Types
5.1 Array
An array is a collection of items of a similar data type. It can contain one or more values of the same data type.
If we define the below array and want to access the first element that is “Cloudduggu” then we can use array[0]. ARRAY(‘Cloudduggu’,’ Hive’).
5.2 Map
The map is a collection of key-value pairs where fields are accessed using array notation of keys(e.g. [‘key’]).
If we define a map such as ‘Firstname’ -> ‘Sarvesh’,’ Lastname’->’ Kumar’ then it will be presented like a map(‘Firstname’, ‘Sarvesh’,’ Lastname’,’ Kumar’) and if you want to access the value of Sarvesh then you can use map[‘firstname’].
5.3 Struct
A struct is a record type that encapsulates a set of named fields which can be any primitive data type.
If we define a structure like STRUCT {x INTEGER; y INTEGER} for z column and wants to access x value then it can access as z.x.