Apache Pig is an execution engine that is used to process large data flow in parallel on Hadoop. Pig has two components the first component is the Pig Latin which is the language, and the second is the runtime environment where Pig Latin programs are executed. Pig Latin provides various operators such as sort, join, filter to perform various operations also provides the facility for the user to develop their custom functions for reading, processing, and writing data.
In Pig Latin a user can write a program to perform a particular task and execute those programs using Grunt Shell, UDFs, or Embedded execution mechanisms. Internally Pig framework converts the Pig Latin language into a MapReduce task and performs data processing.
The following is the architecture of Apache Pig.
Apache Pig Components
Apache Pig process Latin programs through multiple layers.
The following are the component of Apache Pig.
Parser
When a user submits a program, it is initially received by the parser, now the parser performs a syntax check and type checks. Once it performs this activity the output is created a DAG which contains Pig Latin statements and logical operators.
Optimizer
In the optimization phase, the DAG is pushed to a logical optimizer that carries out logical optimizations such as projection and pushdown.
Compiler
In the compiler phase, the complication of the optimized logical plan is done into small Mapreduce jobs and passed to the execution engine.
Execution Engine
This is the final step in which Mapreduce jobs are submitted to Hadoop for execution. After completion of execution, it sends the desired data to the user.
Apache Pig Mode
The following are three interaction modes allowed by Apache Pig.
1. Interactive Mode
In this mode of Apache Pig Interaction, the user uses Grunt shell to run Pig commands. The processing and complication are started when the user submits the STORE command.
2. Batch Mode
In this mode of Apache Pig interaction, a user runs a script file that contains some set of Pig command to perform some action.
3. Embedded Mode
This mode uses the Java library and the method invocation to run Apache Pig commands.