Apache Storm Core Concepts

Apache Storm reads data from external sources in the real-time process and sends processed data to the receiver. Multiple Storm components are involved to perform this operation.

Let us see the core components of Apache Storm.


1. Topology

Apache Storm topology represents a graph of computation in which nodes are the representation of individual computation and edge nodes are used to pass data in between nodes. We use a topology graph to nourish data and get some results.

The following are the main objects which are used to perform the actual operation in Topology.

  • Worker processes: A worker process is used to execute a subsection of topology.
  • Executors (threads): The execution process is a thread that is produced by a worker process.
  • Tasks: This is the main object, which performs actual processing.

cloudduggu strom topology


2. Tuple

In Apache Storm, a tuple is an import and main data structure. It is a named list of values in which the type of value could be anything. Tuple’s field data type is dynamic, it is not required to declare the data type of fields. In the Storm cluster, a tuple can be created by any node and send to multiple nodes in the graph. This process is called emitting a tuple.


3. Stream

A stream is an unbounded sequence of tuples. In topology, there could be one or more streams. Nodes in topology can accept more than one stream as an input. Once input is received, Nodes will perform computation or transformation on input tuples and create a new output stream. Now, this output stream acts as an input stream for other nodes.


4. Spout

A Spout is basically a source of streams. It doesn’t perform any processing. It’s just read data from the data source and release tuples to the next node in the topology.


5. Bolt

A Bolt is used to accept tuples from its input stream and it performs computation and transformation on it such as filtering, aggregation, or join post that it releases a new tuple to the output stream.


Apache Storm Stream grouping

Apache Strom grouping represents how tuples will be sent between instances of spouts and bolts. It helps us to understand more about tuples flow in topology.

There are four types of in-built Storm Grouping which are mentioned below.


1. Shuffle Grouping

In this type of grouping, each bolt instance receives a comparatively equal number of tuples for processing.


2. Field Grouping

In this grouping, the tuples which are having the same value for a particular filed name are sent forward to the same worker of the bolt.


3. Global Grouping

In this grouping, all streams are grouped and sent to one bolt, and thus this grouping forward tuples generated by all instances to a single target instance.


4. All Grouping

This kind of grouping is used in a joint operation in which all grouping will send one copy of every tuple to all receiving bolt workers.