What are the Apache Oozie Control Flow Nodes?
Control flow nodes are used to define the starting and the end of a workflow such as a start control node, end control node, and kill control node and to control the workflow execution path it has the decision, fork, and join nodes.
The following is the list of the Apache Oozie Control flow nodes.
- Start control node
- End control node
- Kill control node
- Decision control node
- Fork and Join control node
Let us see each control flow node in detail.
1. Start Control Node
A workflow job starts with the start control node. It is an entry point of workflow jobs. Each workflow definition will have a start node and when the job is started, it habitually shifts to the node that is mentioned in the start node.
Syntax:
...
...
Example:
...
...
2. End Control Node
...
...
...
...
2. End Control Node
The end control node is used to indicate that the workflow job has been completed successfully.
Syntax:
...
...
Example:
...
3. Kill Control Node
...
...
...
3. Kill Control Node
Kill control node is used to kill a workflow job. In case if one or more actions started by the workflow job are executing when they kill node is reached, then the actions will be killed.
Syntax:
...
[MESSAGE-TO-LOG]
...
Example:
...
Input unavailable
...
4. Decision Control Node
...
[MESSAGE-TO-LOG]
...
...
Input unavailable
...
4. Decision Control Node
A decision node is used to allow a workflow to make a selection on the execution path that should be followed. It consists of a list of predicates-transition pairs plus a default transition. Predicates are estimated in order or appearance until one of them evaluates to true and the corresponding transition is taken. In case there are no predicates evaluate to true then default transition is occupied.
Syntax:
...
[PREDICATE]
...
[PREDICATE]
...
Example:
...
${fs:fileSize(secondjobOutputDir) gt 10 * GB}
${fs:fileSize(secondjobOutputDir) lt 100 * MB}
${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt 1000000 }
...
5. Fork and Join Control Node
...
[PREDICATE]
...
[PREDICATE]
...
...
${fs:fileSize(secondjobOutputDir) gt 10 * GB}
${fs:fileSize(secondjobOutputDir) lt 100 * MB}
${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt 1000000 }
...
5. Fork and Join Control Node
Fork node is used to divide the path of execution into multiple concurrent paths of execution and join node waits until every concurrent execution path of a previous fork node arrives at it. Both Fork node and Join node should be used in pairs.
Syntax:
...
...
...
...
Example:
...
foo:8021
bar:8020
job1.xml
foo:8021
bar:8020
job2.xml
...
...
...
...
...
...
foo:8021
bar:8020
job1.xml
foo:8021
bar:8020
job2.xml
...