Load & Store Functions
Apache Pig supports various type Load & Store Functions such as "pigstorage", "textloader" "binstorage" and handling compression which are used to decide how data will reside in pig and comes out. These functions are used with load and store operators.
The following is the list of Apache Pig supported LOAD and STORE functions.
S.No. | Functions | Description | 1 | PigStorage() | This function is used to load and store structured files. | 2 | TextLoader() | This function is useful in loading the unstructured data. | 3 | BinStorage() | This function is used to load and store data into Pig using the machine-readable format. | 4 | Handling Compression | This function is used to load and store compressed data. |
---|
Let us see a couple of examples.
PigStorage()
PigStorage function is used to load/store the data. It is the default function that supports structured text files in compressed or uncompressed form.
Syntax:
grunt> PigStorage( [field_delimiter] , ['options'] );
To perform this operation we have dataset “employee.txt” which is located at HDFS /pigexample/ location.
Content of “employee.txt”:
1001,James,Butt,New Orleans,Orleans
1002,Josephine,Darakjy,Brighton,Livingston
1003,Art,Venere,Bridgeport,Gloucester
1004,Lenna,Paprocki,Anchorage,Anchorage
1005,Donette,Foller,Hamilton,Butler
1006,Simona,Morasca,Ashland,Ashland
1007,Mitsue,Tollner,Chicago,Cook
1008,Leota,Dilliard,San Jose,Santa
1009,Sage,Wieser,Sioux Falls,Minnehaha
1010,Kris,Marrier,Baltimore,Baltimore
We will load “employee.txt” data from HDFS “/pigexample/” to Pig using PigStorage() function. We have separated the value of records using comma (,) delimiter.
Command:
grunt> empdetail = LOAD '/pigexample/employee.txt' USING PigStorage(',') as (empid:int,firstname:chararray,lastname:chararray,city:chararray,county:chararray );
We can store data in HDFS using PigStorage() function. In this example, we are storing data of relation “empdetail” in HDFS location ‘/pigoutput/outputdata’.
Command:
grunt> STORE empdetail INTO '/pigoutput/outputdata' USING PigStorage (',');
We can verify stored data using the below commands.
Command:
$hadoop fs -ls /pigoutput/outputdata/
$hadoop fs -cat /pigoutput/outputdata/part-m-00000
Output:
TextLoader()
TextLoader()
The TextLoader is useful in loading the unstructured data in the format of UTF8.
Syntax:
grunt> TextLoader();
To perform this operation we have dataset “department.txt” which is located at HDFS /pigexample/ location.
Content of “employee.txt”:
1001,Bette,Nicka,LA,70116
1002,Veronika,Inouye,MI,48116
1003,Willard,Kolmetz,NJ,8014
1004,Maryann,Royster,AK,99501
1005,Alisha,Slusarski,OH,45011
1006,Allene,Iturbide,OH,44805
1007,Chanel,Caudy,IL,60632
1008,Ezekiel,Chui,CA,95111
1009,Willow,Kusko,SD,57105
1010,Bernardo,Figeroa,MD,21224
We will load “department.txt” data from HDFS “/pigexample/” to Pig using TextLoader() function and using the DUMP operator we will print output on the terminal.