Apache Hive provides several types of built-in function which are used to perform various operations.
The following are the list of built-in function provided by Apache Hive.
Let us see each built-in function in detail.
1. Mathematical Functions
Mathematical Functions are used to perform mathematical operations.
The following is the list of Mathematical Functions supported in Apache Hive.
ReturnType | Name | Description |
---|---|---|
DOUBLE | round(DOUBLE a) | It will return the rounded BIGINT value of a. |
DOUBLE | round(DOUBLE a, INT d) | It will return a rounded to d decimal places. |
DOUBLE | bround(DOUBLE a) | It will return the rounded BIGINT value of a using HALF_EVEN rounding mode (as of Hive 1.3.0, 2.0.0). |
DOUBLE | bround(DOUBLE a, INT d) | It will return a rounded to d decimal places using HALF_EVEN rounding mode (as of Hive 1.3.0, 2.0.0). |
BIGINT | floor(DOUBLE a) | It will return the maximum BIGINT value that is equal to or less than a. |
BIGINT | ceil(DOUBLE a), ceiling(DOUBLE a) | This function returns a min BIGINT value that is greater than or similar to a. |
DOUBLE | rand(), rand(INT seed) | It will return a random number (that changes from row to row) that is distributed uniformly from 0 to 1. |
DOUBLE | exp(DOUBLE a), exp(DECIMAL a) | It will return ea where e is the base of the natural logarithm. Decimal version added in Hive 0.13.0. |
DOUBLE | ln(DOUBLE a), ln(DECIMAL a) | It will return the natural logarithm of the argument a. Decimal version added in Hive 0.13.0. |
DOUBLE | log10(DOUBLE a), log10(DECIMAL a) | It will return the base-10 logarithm of the argument a. Decimal version added in Hive 0.13.0. |
DOUBLE | log2(DOUBLE a), log2(DECIMAL a) | It will return the base-2 logarithm of the argument a. Decimal version added in Hive 0.13.0. |
DOUBLE | log(DOUBLE base, DOUBLE a) | It will return the base-base logarithm of the argument a. Decimal versions added in Hive 0.13.0. |
log(DECIMAL base, DECIMAL a) | ||
DOUBLE | pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p) | It will return ap. |
DOUBLE | sqrt(DOUBLE a), sqrt(DECIMAL a) | It will return the square root of a. Decimal version added in Hive 0.13.0. |
STRING | bin(BIGINT a) | It will return the number in binary format. |
STRING | hex(BIGINT h) hex(STRING h) hex(BINARY h) | If the argument is an INT or binary, hex returns the number as a STRING in hexadecimal format. |
BINARY | unhex(STRING a) | Inverse of hex. |
STRING | conv(BIGINT number1, INT from_base, INT to_base), conv(STRING number1, INT from_base, INT to_base) | It will converts a number from a given base to another. |
DOUBLE | abs(DOUBLE a) | It will return the absolute value. |
INT or DOUBLE | pmod(INT a, INT b), pmod(DOUBLE a, DOUBLE b) | It will return the positive value of a mod b. |
DOUBLE | sin(DOUBLE a), sin(DECIMAL a) | It will return the sine of a (a is in radians). Decimal version added in Hive 0.13.0. |
DOUBLE | asin(DOUBLE a), asin(DECIMAL a) | It will return the arc sin of a if -1<=a<=1 or NULL otherwise. Decimal version added in Hive 0.13.0. |
DOUBLE | cos(DOUBLE a), cos(DECIMAL a) | It will return the cosine of a (a is in radians). Decimal version added in Hive 0.13.0. |
DOUBLE | acos(DOUBLE a), acos(DECIMAL a) | It will return the arccosine of a if -1<=a<=1 or NULL otherwise. Decimal version added in Hive 0.13.0. |
DOUBLE | tan(DOUBLE a), tan(DECIMAL a) | It will return the tangent of a (a is in radians). Decimal version added in Hive 0.13.0. |
DOUBLE | atan(DOUBLE a), atan(DECIMAL a) | It will return the arctangent of a. Decimal version added in Hive 0.13.0. |
DOUBLE | degrees(DOUBLE a), degrees(DECIMAL a) | Converts the value of a from radians to degrees. Decimal version added in Hive 0.13.0. |
DOUBLE | radians(DOUBLE a), radians(DOUBLE a) | Converts the value of a from degrees to radians. Decimal version added in Hive 0.13.0. |
INT or DOUBLE | positive(INT a), positive(DOUBLE a) | It will return a. |
INT or DOUBLE | negative(INT a), negative(DOUBLE a) | It will return -a. |
DOUBLE or INT | sign(DOUBLE a), sign(DECIMAL a) | It will return the sign of a as '1.0' (if a is positive) or '-1.0' (if a is negative), '0.0' otherwise. The decimal version will return INT instead of DOUBLE. Decimal version added in Hive 0.13.0. |
DOUBLE | e() | It will return the value of e. |
DOUBLE | pi() | It will return the value of pi. |
BIGINT | factorial(INT a) | It will return the factorial of an (as of Hive 1.2.0). Valid a is [0..20]. |
DOUBLE | cbrt(DOUBLE a) | It will return the cube root of a double value (as of Hive 1.2.0). |
INT | shiftleft(TINYINT|SMALLINT|INT a, INT b) | Bitwise left shift (as of Hive 1.2.0). Shifts a b position to the left. |
BIGINT | shiftleft(BIGINT a, INT b) | It will return int for tinyint, smallint, and int a. Return bigint for bigint a. |
INT | shiftright(TINYINT|SMALLINT|INT a, INT b) | Bitwise right shift (as of Hive 1.2.0). Shifts a b position to the right. |
BIGINT | shiftright(BIGINT a, INT b) | It will return int for tinyint, smallint, and int a. Return bigint for bigint a. |
INT | shiftrightunsigned(TINYINT|SMALLINT|INT a, INT b), | Bitwise unsigned right shift (as of Hive 1.2.0). Shifts a b position to the right. |
BIGINT | shiftrightunsigned(BIGINT a, INT b) | It will return int for tinyint, smallint, and int a. Return bigint for bigint a. |
T | greatest(T v1, T v2, ...) | It will return the greatest value of the list of values (as of Hive 1.1.0). Fixed to return NULL when one or more arguments are NULL, and strict type restriction relaxed, consistent with ">" operator (as of Hive 2.0.0). |
T | least(T v1, T v2, ...) | It will return the least value of the list of values (as of Hive 1.1.0). Fixed to return NULL when one or more arguments are NULL, and strict type restriction relaxed, consistent with "<" operator (as of Hive 2.0.0). |
INT | width_bucket(NUMERIC expr, NUMERIC min_value, NUMERIC max_value, INT num_buckets) | It will return an integer between 0 and num_buckets+1 by mapping expr into the ith equally sized bucket. Buckets are made by dividing [min_value, max_value] into equally sized regions. If expr < min_value, return 1, if expr > max_value return num_buckets+1. |
2. Collection Functions
These built-in functions are used for collections. A collection represents the grouping of elements and returning a single or array of elements depends on the return type mentioned in the function name.
The following list of Collection Functions is supported in Apache Hive.
ReturnType | Name | Description |
---|---|---|
int | size(Map<K.V>) | It will return the number of elements in the map type. |
int | size(Array<T>) | It will return the number of elements in the array type. |
array <K> | map_keys(Map <K.V>) | It will return an unordered array containing the keys of the input map. |
array <V> | map_values(Map <K.V>) | It will return an unordered array containing the values of the input map. |
boolean | array_contains(Array <T>, value) | It will return TRUE if the array contains value. |
array <t> | sort_array(Array <T>) | Perform sorting of input array in ascending order according and return the result. |
3. Date Functions
Dates built-in functions are used to operate on date data types such as adding the number of days to the date or other similar operations.
The following are the lists of Date Functions supported in Apache Hive.
ReturnType | Name | Description | string | from_unixtime(bigint unixtime[, string format]) | This function is used to convert the UNIX epoch seconds into the string of the current system timestamp. | bigint | unix_timestamp() | This function fetches the Unix current timestamp in seconds. | bigint | unix_timestamp(string date) | It will convert time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds). | bigint | unix_timestamp(string date, string pattern) | This function converts the time string into a provided pattern. | int | year(string date) | It returns the year part of a date or a timestamp string: year("2020-01-01 00:00:00") = 2020, year("2020-01-01") = 2020. | int | quarter(date or timestamp or string) | It will return the quarter of the year for a date, timestamp, or string in the range 1 to 4. | int | month(string date) | This function returns the month of a date or a timestamp string. | int | day(string date) dayofmonth(date) | This function returns the day of a date or a timestamp string. | int | hour(string date) | This function returns the hour of a timestamp. | int | minute(string date) | It will return the minute of the timestamp. | int | second(string date) | It will return the second of the timestamp. | int | weekofyear(string date) | This function returns the week number from a timestamp string. |
---|---|---|
string | add_months(string startdate_column, int nummonths_column, output_date_format) | It will return the date that is num_months after startdate_column (as of Hive 1.1.0). startdate_column is a string, date or timestamp. nummonths_column is an integer. |
string | last_day(string date) | It will return the last day of the month to which the date belongs. |
string | next_day(string startdate_column, string dayofweek_column) | This function returns the first date that is later than the start date. |
string | trunc(string date, string format) | This function returns the date truncated to a unit that is mentioned in the format. |
double | months_between(date1, date2) | It will return number of months between dates date1 and date2. |
string | date_format(date/timestamp/string ts, string fmt) | This function converts the date or timestamp or string into a value of the string. |
4. Conditional Functions
These built-in functions work on test conditions. If the test condition is true then it will return true.
The following is the list of Conditional Functions supported in Apache Hive.
ReturnType | Name | Description |
---|---|---|
T | if(boolean testCondition, T valueTrue, T valueFalseOrNull) | It will return valueTrue when testCondition is true, returns valueFalseOrNull otherwise. |
boolean | isnull( a ) | It will return true if a is NULL and false otherwise. |
boolean | isnotnull ( a ) | It will return true if a is not NULL and false otherwise. |
T | nvl(T value, T default_value) | It will return default value if value is null else returns value (as of HIve 0.11). |
T | COALESCE(T v1, T v2, ...) | It will return the first v that is not NULL, or NULL if all v's are NULL. |
T | CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END | When a = b, returns c; when a = d, returns e; else returns f. |
T | CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END | When a = true, returns b; when c = true, returns d; else returns e. |
T | nullif( a, b ) | It will return NULL if a=b; otherwise returns a (as of Hive 2.3.0). |
void | assert_true(boolean condition) | It will throw an exception if 'condition' is not true, otherwise return null (as of Hive 0.8.0). For example, select assert_true (2<1). |
5. String Functions
String built-in functions are used to perform different operations like reversing sting, converting into upper and lower case, removing spaces, and so on.
Below are the lists of String Functions supported in Hive.
ReturnType | Name | Description | int | ascii(string strdata) | This function returns the numeric value of the first character of strdata. | string | base64(binary bin) | The function will convert the argument passed into a base 64 string. | int | character_length(string str_data) | This function returns the number of UTF-8 characters that contain str_data. | string | chr(bigint|double y) | This function returns the ASCII character that has a binary equivalent to Y. | string | concat(string|binary Y, string|binary Z...) | This function returns the string or bytes by performing concatenation of strings. | array | context_ngrams(array | This function returns the top set of tokenized sentences. | string | concat_ws(string SEP, string Y, string Z...) | This function is similar to Concat() but also provide custom separator SEP. | string | concat_ws(string SEP, array | It is similar to concat_ws() but it also takes an array of strings. | string | decode(binary bin, string charset) | This function uses the provides character set to decodes the first argument into the string. | binary | encode(string src, string charset) | This function uses the provides character set to encodes the first argument into the binary. | int | find_in_set(string strdata, string strList_data) | This function returns the first appearance of strdata from strList_data in which strList_data is the comma-delimited string. | string | format_number(number data, int data1) | This function format the number data to a format like '#,###,###.##', rounded to data1 decimal places, and returns the result as a string. | string | get_json_object(string json_string, string path) | This function will fetch the JSON object from a JSON string based on the JSON path specified and returns the JSON string of the extracted JSON object. | boolean | in_file(string str_data, string filen_ame) | This function returns the true value If the str_data will present in a complete line of file_name. | int | instr(string str_data, string substr_data) | This function will return the first appearance of substr_data in the string str_data. In case the argument is null then it will return a null value and it will return 0 in case the substr_data is not present in str_data. | int | length(string ZY) | This function will return the length of a provided string. | int | locate(string substr_data, string str_data[, int pos_data]) | This function returns the first occurrence position of substr_data in the string str_data that too after pos_data position. | string | lower(string O) lcase(string O) | This function will convert all characters of O in lower case. | string | ltrim(string H) | This function will return the output by trimming the space from the beginning of H. | string | printf(String format, Obj... args) | It will return the input formatted according to do printf-style format strings. | string | repeat(string str, int n) | Repeats str n times. | string | replace(string A, string OLD, string NEW) | It will return the string A with all non-overlapping occurrences of OLD replaced with NEW . | string | reverse(string A) | It will return the reversed string. | string | rpad(string strdata3, int lendata3, string paddata3) | It will return str, right-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters. In the case of an empty pad string, the return value is null. | string | rtrim(string R) | It will return the string resulting from trimming spaces from the end(right-hand side) of R. | string | space(int n) | It will return a string of n spaces. | array | split(string str, string pat) | Splits str around pat (pat is a regular expression). | string | trim(string L) | It will return the string resulting from trimming spaces from both ends of L. | binary | unbase64(string str) | Converts the argument from a base 64 string to BINARY. | string | upper(string P) ucase(string Q) | It will return the string resulting from converting all characters of P to upper case. | string | initcap(string P) | It will return the string, with the first letter of each word in uppercase, all other letters in lowercase. | int | levenshtein(string I, string J) | It will return the Levenshtein distance between two strings. | string | soundex(string H) | It will return the soundex code of the string. |
---|