Apache Oozie provides the facility for managing system JARs as well as user-defined JARs.
The following is the list of JARs, that are managed by Oozie.
- System JARs
- Hadoop JARs
- Action JARs
- User JARs
Let us see each type of JAR in detail.
1. System JARs
These JARs are produced during an Oozie build and included as part of the Oozie web application archive (oozie.war) file and used to run Oozie services.
2. Hadoop JARs
These JARs are used by Oozie to create communication with Hadoop services. These JARs are generated by Hadoop and Oozie adds them into the web application archive at the time of packaging.
3. Action JARs
These JARs are used to execute built-in Oozie actions.
4. User JARs
These JARs are created by end-users to execute their application logic such as mapper and reducer are required to run MapReduce action similarly Pig/Hive UDF code is required for Java action. The user bundles their JAR and deploys it under the “/lib” directory of the workflow application path.
Apache Oozie JARs Design Challenges
To design a flexible and in-built framework for JAR management in a complex system like Oozie is very tricky.
The following are some of the reasons.
1. Multiple Action Types
Apache Oozie manages different types of built-in and user-defined actions. Each action type has a different type of JARs and in a certain condition, they conflict with each other. For example, Pig and Hive have their JARs and in some cases similar JARs. In this case, Oozie should include only those JARs which are required for that action and exclude those JARs which are not required.
2. Multiple Versions
There should not be any dependency on the tool version, one action type should support multiple versions. Oozie should provide a framework to support multiple versions of each tool.
3. Different Hadoop Versions
Hadoop plays an important role in JAR management as most of the actions are directly related to the Hadoop system. The issue that arises with the Hadoop version is that if a JAR complies with Hadoop Version 1.x then it will not run on the Hadoop 2.x cluster, such type of variability should be addressed by the Oozie framework.
4. Unified Jar Upgrade
In some cases, if Oozie supports Pig 0.11 and there is some important bug fix added to Pig 0.11 and Oozie need to replace or add new JAR in that case if we directly replace it then it will create an issue for running job because of the way the Hadoop distributed cache works. Oozie should facilitate an easy and error-free JAR upgrade.
Apache Oozie JAR Precedence in Classpath
There are three ways in which JARs can be included in any workflow of Oozie.
Let us see the precedence of JAR.
Application lib Directory
If a JAR is present in the workflow application “/lib” directory that means it has given high priority in the classpath.
User-level shared Library
The user-level shared library is the second-highest priority in the classpath. It is defined through Oozie.libpath.
System-level shared Library
The JAR actions which are included in system-defined shared lib have the lowest priority.