Tuple,Bag,Map Functions
Apache Pig supports various types of Tuple, Bag, Map Functions such as TOBAG, TOP, TOTUPLE, and TOMAP to perform a different type of operation.
The following is the list of Tuple, Bag, Map functions supported by Apache Pig.
Sr No | Functions | Description | 1 | TOBAG() | This function is used to convert two or more expressions into a bag. | 2 | TOP() | This function is used to get the top N tuples of a relation. | 3 | TOTUPLE() | This function is used to convert one or more expressions into a tuple. | 4 | TOMAP() | This function is used to convert the key-value pairs into a Map. |
---|
Let us see a couple of examples.
TOBAG()
TOBAG function is used to convert one or more expressions to individual tuples which are then placed in a bag.
Syntax:
grunt> TOBAG(expression [, expression ...])
To perform this operation we have used the “studentdata.txt” dataset. We will put “studentdata.txt” in the HDFS location “/pigexample/” from the local file system. Content of “studentd
Content of “studentdata.txt”:
1,Chanel,Shawnee,KS,39
2,Ezekiel,Easton,MD,37
3,Willow,New York,NY,40
4,Bernardo,Conroe,TX,38
5,Ammie,Columbus,OH,38
6,Francine,Las Cruces,NM,38
7,Ernie,Ridgefield Park,NJ,38
8,Albina,Dunellen,NJ,56
9,Alishia,New York,NY,34
10,Solange,Metairie,LA,54
We will load “studentdata.txt” from the local filesystem into HDFS “/pigexample/” using the below commands.
Command:
$hadoop fs -copyFromLocal /home/cloudduggu/pig/tutorial/studentdata.txt /pigexample/
Now we will create relation "studentdata" and load data from HDFS to Pig.
Command:
grunt> studentdata = LOAD '/pigexample/studentdata.txt' USING PigStorage(',')
as (studentid:int,firstname:chararray,lastname:chararray,city:chararray,gpa:int);
Now we will convert each record (studentid,firstname,lastname,city,gpa) into tuples and print output using the DUMP operator.
Command:
grunt> tobagdata = FOREACH studentdata GENERATE TOBAG (studentid,firstname,lastname,city,gpa);
grunt> DUMP tobagdata;
Output:
TOTUPLE()
TOTUPLE()
The TOTUPLE function is used to convert one or more expressions to a tuple.
Syntax:
grunt> TOTUPLE(expression [, expression ...])
We will use the relation “studentdata” which is created in the TOBAG section and convert each record (studentid,firstname,lastname,city,gpa) into tuples and print output using the DUMP operator.