Load & Store Functions
Apache Pig supports various type Load & Store Functions such as "pigstorage", "textloader" "binstorage" and handling compression which are used to decide how data will reside in pig and comes out. These functions are used with load and store operators.
The following is the list of Apache Pig supported LOAD and STORE functions.
| S.No. | Functions | Description | 1 | PigStorage() | This function is used to load and store structured files. | 2 | TextLoader() | This function is useful in loading the unstructured data. | 3 | BinStorage() | This function is used to load and store data into Pig using the machine-readable format. | 4 | Handling Compression | This function is used to load and store compressed data. | 
|---|
Let us see a couple of examples.
PigStorage()
PigStorage function is used to load/store the data. It is the default function that supports structured text files in compressed or uncompressed form.
Syntax:
    
        grunt> PigStorage( [field_delimiter] , ['options'] );
    
    
    
    
To perform this operation we have dataset “employee.txt” which is located at HDFS /pigexample/ location.
Content of “employee.txt”:
    1001,James,Butt,New Orleans,Orleans
    1002,Josephine,Darakjy,Brighton,Livingston
    1003,Art,Venere,Bridgeport,Gloucester
    1004,Lenna,Paprocki,Anchorage,Anchorage
    1005,Donette,Foller,Hamilton,Butler
    1006,Simona,Morasca,Ashland,Ashland
    1007,Mitsue,Tollner,Chicago,Cook
    1008,Leota,Dilliard,San Jose,Santa
    1009,Sage,Wieser,Sioux Falls,Minnehaha
    1010,Kris,Marrier,Baltimore,Baltimore
 
        
    We will load “employee.txt” data from HDFS “/pigexample/” to Pig using PigStorage() function. We have separated the value of records using comma (,) delimiter.
Command:
    
        grunt> empdetail = LOAD '/pigexample/employee.txt' USING PigStorage(',') as (empid:int,firstname:chararray,lastname:chararray,city:chararray,county:chararray );
    
    
    
    
We can store data in HDFS using PigStorage() function. In this example, we are storing data of relation “empdetail” in HDFS location ‘/pigoutput/outputdata’.
Command:
    
        grunt> STORE empdetail INTO '/pigoutput/outputdata' USING PigStorage (',');
    
    
    
    
We can verify stored data using the below commands.
Command:
    
        $hadoop fs -ls /pigoutput/outputdata/
    
    
    
        $hadoop fs -cat /pigoutput/outputdata/part-m-00000
    
    
    
    Output:
     
    
    
    
    TextLoader()
    
 
    TextLoader()
The TextLoader is useful in loading the unstructured data in the format of UTF8.
Syntax:
    
        grunt> TextLoader();
    
    
    
    
To perform this operation we have dataset “department.txt” which is located at HDFS /pigexample/ location.
Content of “employee.txt”:
    1001,Bette,Nicka,LA,70116
    1002,Veronika,Inouye,MI,48116
    1003,Willard,Kolmetz,NJ,8014
    1004,Maryann,Royster,AK,99501
    1005,Alisha,Slusarski,OH,45011
    1006,Allene,Iturbide,OH,44805
    1007,Chanel,Caudy,IL,60632
    1008,Ezekiel,Chui,CA,95111
    1009,Willow,Kusko,SD,57105
    1010,Bernardo,Figeroa,MD,21224
 
        
    We will load “department.txt” data from HDFS “/pigexample/” to Pig using TextLoader() function and using the DUMP operator we will print output on the terminal.
Command:
    
        grunt> deptdata = LOAD '/pigexample/department.txt' USING TextLoader();
    
    
    
        grunt> DUMP deptdata;
    
    
    
    Output:
     
    
    
    
 
     
