Apache Pig Grunt Shell

Apache Pig Grunt is an interactive shell that enables users to enter Pig Latin interactively and provides a shell to interact with HDFS and local file system commands. You can enter Pig Latin commands directly into the Grunt shell for execution. Apache Pig starts executing the Pig Latin language when it receives the STORE or DUMP command. Before executing the command Pig Grunt shell do check the syntax and semantics to void any error.

To start Pig Grunt type :

$pig -x local

It will start Pig Grunt shell:

grunt>

Now using Grunt shell you can interact with your local filesystem. But if you forget the -x local and have a cluster configuration set in PIG_CLASSPATH, then it put you in a Grunt shell that will interact with HDFS on your cluster.

HDFS commands in Pig Grunt

We can use the Pig Grunt shell to run HDFS commands as well. Starting from Pig version 0.5 all Hadoop fs shell commands are available to use. They are accessed using the keyword FS followed by the command.

Let us see few HDFS commands from the Pig Grunt shell.

fs -ls /

This command will print all directories present in HDFS “/”.

Syntax:

grunt> fs subcommand subcommand_parameters;

Command:

grunt> fs -ls /

Output:

fs -cat

This command will print the content of a file present in HDFS.

Syntax:

grunt> fs subcommand subcommand_parameters

Command:

grunt> fs -cat /hive/warehouse/kv2.txt

Output:

fs -mkdir

This command will create a directory in HDFS.

Syntax:

grunt> fs subcommand subcommand_parameters

Command:

grunt> fs -mkdir /pigdata

Output:

fs -copyFromLocal

This command will copy a file from the local system to HDFS.

Syntax:

grunt> fs subcommand subcommand_parameters

Command:

grunt> fs -copyFromLocal /home/cloudduggu/pig/tutorial/emp.txt /pigdata/

Output:

Shell commands in Pig Grunt

We can use the Pig Grunt shell to run the basic shell command. We can invoke any shell commands using sh.

Let us see few Shell commands from the Pig Grunt shell. We cannot execute those commands which are part of the shell environment such as –cd.

sh ls

This command will list all directories/files.

Syntax:

grunt> sh subcommand subcommand_parameters

Command:

grunt> sh ls

Output:

sh cat

This command will print the content of a file.

Syntax:

grunt> sh subcommand subcommand_parameters

Command:

grunt> sh cat

Output:

Utility commands in Pig Grunt

Pig Grunt supports utilities commands as well such as help, clear, history apart from this Grunt also provides commands for controlling Pig and MapReduce such as exec, run, kill.

Help Command

Help command provides a list of Pig commands.

Syntax:

grunt> help

Command:

grunt> help

Output:

Clear Command

Clear command is used to clear the screen of the Grunt shell.

Syntax:

grunt> Clear

Command:

grunt> Clear

History Command

The history command is used to clear the screen of the Grunt shell.

Syntax:

grunt> history

Command:

grunt> history

Output:

Set Command

The SET command is used to assign values to keys that are case sensitive. In case the SET command is used without providing arguments then all other system properties and configurations are printed.

Syntax:

grunt> set [key 'value']

Command:

grunt> SET debug 'on'
grunt> SET job.name 'my job'
grunt> SET default_parallel 100

Key	Description
default_parallel	Using this parameter you can set the number of reducers for all MapReduce jobs generated by Pig.
debug	Using this parameter you can turn debug-level logging on or off.
job.name	Using this parameter you can set a user-specified name for the job.
job.priority	Using this parameter you can set the priority of a Pig job such as very_low, low, normal, high, very_high.
stream.skippath	Using this parameter you can set the path from where the data is not to be transferred, bypassing the desired path in the form of a string to this key.

EXEC Command

Exec command is used to execute Pig script from Grunt shell.

Please make sure the history server is running. You can verify in the JPS command output. This service “JobHistoryServer” should be running otherwise you can start it using the below command.

$ /home/cloudduggu/hadoop/sbin$./mr-jobhistory-daemon.sh start historyserver

Let us assume that we have a file name “emp.txt” which is present on HDFS /pigdata/ directory. Now we want to use this file and project its content using Pig script.

Content of “emp.txt”:



    201,Wick,Google
    203,John,Facebook
    204,Partick,Instagram
    205,Hema,Google
    206,Holi,Facebook
    207,Michael,Instagram
    208,Michael,Instagram
    209,Chung,Instagram
    210,Anna,Instagram

Now we will create an “emp_script.pig” script file which will have the below statements to process data and put this file on the same location of HDFS that is /pigdata/.

Content of “emp_script.pig”:



    employee = LOAD 'hdfs://localhost:9000/pigdata/emp.txt' USING PigStorage(',')
    as (empid:int,empname:chararray,salary:int);
    dump employee;

Now we will start the Pig Grunt shell and run the script.

Syntax:

grunt> exec [–param param_name = param_value] [–param_file file_name] [script]

Command:

$pig

grunt> exec hdfs:///pigdata/emp_script.pig

Output:

Kill Command

The kill command will attempt to kill any MapReduce jobs associated with the Pig job

Syntax:

grunt> kill JobId

Command:

grunt> kill job_500

Run Command

The run command is used to run a Pig script that can interact with the Grunt shell (interactive mode).

The difference between the “exec” and “run” command is that in the run command you can see commands output on screen but in “exec” it is not.

We will use the same example which we used for “exec command” and run the below command.

Syntax:

grunt> run [–param param_name = param_value] [–param_file file_name] script

Command:

grunt> run hdfs:///pigdata/emp_script.pig

Output:

Apache Pig Grunt Shell Pig - Installation Pig - Execution Modes

Pig - Installation

Pig - Execution Modes

HDFS commands in Pig Grunt

fs -ls /

Syntax:

Command:

Output:

fs -cat

Syntax:

Command:

Output:

fs -mkdir

Syntax:

Command:

Output:

fs -copyFromLocal

Syntax:

Command:

Output:

Shell commands in Pig Grunt

sh ls

Syntax:

Command:

Output:

sh cat

Syntax:

Command:

Output:

Utility commands in Pig Grunt

Help Command

Syntax:

Command:

Output:

Clear Command

Syntax:

Command:

History Command

Syntax:

Command:

Output:

Set Command

Syntax:

Command:

EXEC Command

Content of “emp.txt”:

Content of “emp_script.pig”:

Syntax:

Command:

Output:

Kill Command

Syntax:

Command:

Run Command

Syntax:

Command:

Output:

Apache Pig Grunt Shell