Apache Pig Grunt is an interactive shell that enables users to enter Pig Latin interactively and provides a shell to interact with HDFS and local file system commands. You can enter Pig Latin commands directly into the Grunt shell for execution. Apache Pig starts executing the Pig Latin language when it receives the STORE or DUMP command. Before executing the command Pig Grunt shell do check the syntax and semantics to void any error.
To start Pig Grunt type :
$pig -x localIt will start Pig Grunt shell:
grunt>Now using Grunt shell you can interact with your local filesystem. But if you forget the -x local and have a cluster configuration set in PIG_CLASSPATH, then it put you in a Grunt shell that will interact with HDFS on your cluster.
HDFS commands in Pig Grunt
We can use the Pig Grunt shell to run HDFS commands as well. Starting from Pig version 0.5 all Hadoop fs shell commands are available to use. They are accessed using the keyword FS followed by the command.
Let us see few HDFS commands from the Pig Grunt shell.
fs -ls /
This command will print all directories present in HDFS “/”.
Syntax:
grunt> fs subcommand subcommand_parameters;
Command:
grunt> fs -ls /
Output:
fs -cat
Output:
fs -cat
This command will print the content of a file present in HDFS.
Syntax:
grunt> fs subcommand subcommand_parameters
Command:
grunt> fs -cat /hive/warehouse/kv2.txt
Output:
fs -mkdir
Output:
fs -mkdir
This command will create a directory in HDFS.
Syntax:
grunt> fs subcommand subcommand_parameters
Command:
grunt> fs -mkdir /pigdata
Output:
fs -copyFromLocal
Output:
fs -copyFromLocal
This command will copy a file from the local system to HDFS.
Syntax:
grunt> fs subcommand subcommand_parameters
Command:
grunt> fs -copyFromLocal /home/cloudduggu/pig/tutorial/emp.txt /pigdata/
Output:
Shell commands in Pig Grunt
Output:
Shell commands in Pig Grunt
We can use the Pig Grunt shell to run the basic shell command. We can invoke any shell commands using sh.
Let us see few Shell commands from the Pig Grunt shell. We cannot execute those commands which are part of the shell environment such as –cd.
sh ls
This command will list all directories/files.
Syntax:
grunt> sh subcommand subcommand_parameters
Command:
grunt> sh ls
Output:
sh cat
Output:
sh cat
This command will print the content of a file.
Syntax:
grunt> sh subcommand subcommand_parameters
Command:
grunt> sh cat
Output:
Utility commands in Pig Grunt
Output:
Utility commands in Pig Grunt
Pig Grunt supports utilities commands as well such as help, clear, history apart from this Grunt also provides commands for controlling Pig and MapReduce such as exec, run, kill.
Help Command
Help command provides a list of Pig commands.
Syntax:
grunt> help
Command:
grunt> help
Output:
Clear Command
Output:
Clear Command
Clear command is used to clear the screen of the Grunt shell.
Syntax:
grunt> Clear
Command:
grunt> Clear
History Command
History Command
The history command is used to clear the screen of the Grunt shell.
Syntax:
grunt> history
Command:
grunt> history
Output:
Set Command
Output:
Set Command
The SET command is used to assign values to keys that are case sensitive. In case the SET command is used without providing arguments then all other system properties and configurations are printed.
Syntax:
grunt> set [key 'value']
Command:
grunt> SET debug 'on'
grunt> SET job.name 'my job'
grunt> SET default_parallel 100
grunt> SET job.name 'my job'
grunt> SET default_parallel 100
Key | Description | default_parallel | Using this parameter you can set the number of reducers for all MapReduce jobs generated by Pig. | debug | Using this parameter you can turn debug-level logging on or off. | job.name | Using this parameter you can set a user-specified name for the job. | job.priority | Using this parameter you can set the priority of a Pig job such as very_low, low, normal, high, very_high. | stream.skippath | Using this parameter you can set the path from where the data is not to be transferred, bypassing the desired path in the form of a string to this key. |
---|
EXEC Command
Exec command is used to execute Pig script from Grunt shell.
Please make sure the history server is running. You can verify in the JPS command output. This service “JobHistoryServer” should be running otherwise you can start it using the below command.
$ /home/cloudduggu/hadoop/sbin$./mr-jobhistory-daemon.sh start historyserverLet us assume that we have a file name “emp.txt” which is present on HDFS /pigdata/ directory. Now we want to use this file and project its content using Pig script.
Content of “emp.txt”:
201,Wick,Google
203,John,Facebook
204,Partick,Instagram
205,Hema,Google
206,Holi,Facebook
207,Michael,Instagram
208,Michael,Instagram
209,Chung,Instagram
210,Anna,Instagram
201,Wick,Google
203,John,Facebook
204,Partick,Instagram
205,Hema,Google
206,Holi,Facebook
207,Michael,Instagram
208,Michael,Instagram
209,Chung,Instagram
210,Anna,Instagram
Now we will create an “emp_script.pig” script file which will have the below statements to process data and put this file on the same location of HDFS that is /pigdata/.
Content of “emp_script.pig”:
employee = LOAD 'hdfs://localhost:9000/pigdata/emp.txt' USING PigStorage(',')
as (empid:int,empname:chararray,salary:int);
dump employee;
employee = LOAD 'hdfs://localhost:9000/pigdata/emp.txt' USING PigStorage(',')
as (empid:int,empname:chararray,salary:int);
dump employee;
Now we will start the Pig Grunt shell and run the script.
Syntax:
grunt> exec [–param param_name = param_value] [–param_file file_name] [script]
Command:
$pig
grunt> exec hdfs:///pigdata/emp_script.pigOutput:
Kill Command
The kill command will attempt to kill any MapReduce jobs associated with the Pig job
Syntax:
grunt> kill JobId
Command:
grunt> kill job_500
Run Command
Run Command
The run command is used to run a Pig script that can interact with the Grunt shell (interactive mode).
The difference between the “exec” and “run” command is that in the run command you can see commands output on screen but in “exec” it is not.
We will use the same example which we used for “exec command” and run the below command.