There are multiple layers of security present in the Hadoop ecosystem and for large enterprise Hadoop systems, it has become an important factor that includes securing the data storage and processing. Oozie is used to schedule and manage Hadoop jobs, so providing the same level of security at every level is very important.
Oozie works in between users and Hadoop hence it supports two forms of security.
- Oozie Service to Hadoop Services
- Oozie Client to Oozie Service
The following figure shows the two forms of security.
Let us see each form of security in detail.
1. Oozie Service to Hadoop Services
In this form of security, Oozie will act as a client and will submit the user’s job to the Hadoop resource manager for authentication. Hadoop supports Kerberos-based authentication hence Oozie will present appropriate Kerberos credentials to those services.
1.1 Hadoop Services Configuration
Oozie uses its credential for Hadoop services and utilizes the Hadoop proxy user feature to perform as a proxy for its end users. Oozie service user (oozie) detail are mentioned in Hadoop’s core-site.xml file.
The following two parameters are required for the Oozie server user (Oozie) and the example values for [OOZIE_SERVICE_OWNER], [OOZIE_SERVICE_OWNER_GROUP], and [OOZIE_SERVICE_HOSTNAME] are oozie, users, and localhost.
hadoop.proxyuser.[OOZIE_SERVICE_OWNER].hosts
[OOZIE_SERVICE_HOSTNAME]
hadoop.proxyuser.[OOZIE_SERVICE_OWNER].groups
[OOZIE_SERVICE_OWNER_GROUP]
2. Oozie Client to Oozie Service
Oozie server can be configured to authenticate any Oozie client request and this way Oozie will check client credentials during job submission. Once the job is submitted Oozie will not check the user’s credentials for any recurrent or delayed scheduling of Hadoop jobs. By default, Oozie supports Kerberos-based authentication between the client and the server.