Apache Flume is a big data streaming tool that is used to collect, aggregate, and move stream data from different types of sources system to the destination system. Its architecture is based on the streaming data flow. It provides the facility to overcome different types of failures.
Apache Flume Channels are the containers in which the events are stored on an agent. Source adds the events in channels and Sink removes events from channels.
Apache Flume provides different types of channels as mentioned below.
- Memory Channel
- JDBC Channel
- Kafka Channel
- File Channel
- Spillable Memory Channel
- Pseudo Transaction Channel
- Custom Channel
Let us see each Channel in detail.
1. Memory Channel
Apache Flume Memory Channel stores events in the memory queue. It can be configured with max size. Memory Channel is help full for the flows which require high throughput and prepared in case of agent failures.
Example for Memory Channel.
agentone.channels = channelone
agentone.channels.channelone.type = memory
agentone.channels.channelone.capacity = 10000
agentone.channels.channelone.transactionCapacity = 10000
agentone.channels.channelone.byteCapacityBufferPercentage = 20
agentone.channels.channelone.byteCapacity = 800000
2. JDBC Channel
In JDBC Channel, the events are stored in persistent storage and backed by a database. Currently, JDBC Channel supports embedded Derby database. JDBC Channel is used for flows where recoverability is important.
Example for Apache Flume JDBC Channel.
agentone.channels = channelone
agentone.channels.channelone.type = jdbc
3. Kafka Channel
In Apache Flume Kafka Channel, the events are stored in a Cluster of Kafka. Kafka must be installed separately. Kafka provides better replication and availability. In case of agent or broker failure, the events are immediately available to other sinks.
The Kafka Channels can be used for multipurpose.
- It provides a reliable and highly available channel for events when we use it with Flume source and sink.
- By using Flume source and interceptor, it allows writing Flume events into a Kafka topic, for use by other apps.
- By using Flume Channel with Kafka Channel, it provides a low-latency, fault-tolerant way to send events from Kafka to Flume sinks such as HDFS, HBase, or Solr.
Example for Apache Flume Kafka Channel.
agentone.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel
agentone.channels.channel1.kafka.bootstrap.servers = kafka-1:9092,kafka-2:9092,kafka-3:9092
agentone.channels.channel1.kafka.topic = channel1
agentone.channels.channel1.kafka.consumer.group.id = flume-consumer
Apache Flume Kafka Channel can be used with the following properties.
Let us see each property in detail.
3.1 Security and Kafka Channel
Apache Flume communication with Kafka supports secure authentication as well as it supports data encryption.
We can achieve that by setting kafka.producer|consumer.security.protocol to any of the following value indicates below.
- SASL_PLAINTEXT: It provides no data encryption and supports plaintext authentication.
- SASL_SSL: It provides data encryption and supports plaintext or Kerberos authentication.
- SSL: It indicates TLS based encryption with optional authentication.
3.2 TLS and Kafka Channel
The below example is showing the configuration with server-side authentication and data encryption.
agentone.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel
agentone.channels.channel1.kafka.bootstrap.servers = kafka-1:9093,kafka-2:9093,kafka-3:9093
agentone.channels.channel1.kafka.topic = channel1
agentone.channels.channel1.kafka.consumer.group.id = flume-consumer
agentone.channels.channel1.kafka.producer.security.protocol = SSL
agentone.channels.channel1.kafka.producer.ssl.truststore.location = /path/to/truststore.jks
agentone.channels.channel1.kafka.producer.ssl.truststore.password = password to access the truststore
agentone.channels.channel1.kafka.consumer.security.protocol = SSL
agentone.channels.channel1.kafka.consumer.ssl.truststore.location = /path/to/truststore.jks
agentone.channels.channel1.kafka.consumer.ssl.truststore.password = password to access the truststore
3.3 Kerberos and Kafka Channel
For using Kafka channel with a Kafka cluster secured with Kerberos, set the producer/consumer.security.protocol properties for producer and/or consumer.
JAVA_OPTS="$JAVA_OPTS -Djava.security.krb5.conf=/path/to/krb5.conf"
JAVA_OPTS="$JAVA_OPTS -Djava.security.auth.login.config=/path/to/flume_jaas.conf"
Example for secure configuration using SASL_PLAINTEXT:
agentone.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel
agentone.channels.channel1.kafka.bootstrap.servers = kafka-1:9093,kafka-2:9093,kafka-3:9093
agentone.channels.channel1.kafka.topic = channel1
agentone.channels.channel1.kafka.consumer.group.id = flume-consumer
agentone.channels.channel1.kafka.producer.security.protocol = SASL_PLAINTEXT
agentone.channels.channel1.kafka.producer.sasl.mechanism = GSSAPI
agentone.channels.channel1.kafka.producer.sasl.kerberos.service.name = kafka
agentone.channels.channel1.kafka.consumer.security.protocol = SASL_PLAINTEXT
agentone.channels.channel1.kafka.consumer.sasl.mechanism = GSSAPI
agentone.channels.channel1.kafka.consumer.sasl.kerberos.service.name = kafka
Example for secure configuration using SASL_SSL:
agentone.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel
agentone.channels.channel1.kafka.bootstrap.servers = kafka-1:9093,kafka-2:9093,kafka-3:9093
agentone.channels.channel1.kafka.topic = channel1
agentone.channels.channel1.kafka.consumer.group.id = flume-consumer
agentone.channels.channel1.kafka.producer.security.protocol = SASL_SSL
agentone.channels.channel1.kafka.producer.sasl.mechanism = GSSAPI
agentone.channels.channel1.kafka.producer.sasl.kerberos.service.name = kafka
agentone.channels.channel1.kafka.producer.ssl.truststore.location = /path/to/truststore.jks
agentone.channels.channel1.kafka.producer.ssl.truststore.password = password to access the truststore
agentone.channels.channel1.kafka.consumer.security.protocol = SASL_SSL
agentone.channels.channel1.kafka.consumer.sasl.mechanism = GSSAPI
agentone.channels.channel1.kafka.consumer.sasl.kerberos.service.name = kafka
agentone.channels.channel1.kafka.consumer.ssl.truststore.location = /path/to/truststore.jks
agentone.channels.channel1.kafka.consumer.ssl.truststore.password = password to access the truststore
4. File Channel
Apache Flume file channel is called a tough channel because it stores the events in the disk so in case the JVM process is killed then or the system is rebooted and any crash happens, the event will be transferred to the next agent in the pipeline.
Example for Apache Flume File Channel.
agentone.channels = c1
agentone.channels.c1.type = file
agentone.channels.c1.checkpointDir = /mnt/flume/checkpoint
agentone.channels.c1.dataDirs = /mnt/flume/data
5. Spillable Memory Channel
Apache Flume Spillable Memory channels are used to store the event in memory as well as on the disk. The memory acts as primary storage and the disk is used when there is an overflow of data.
Apache Flume Spillable Memory channels are used for testing and not suggested for production system usage.
Example for Apache Flume Spillable Memory Channel.
agentone.channels = c1
agentone.channels.c1.type = SPILLABLEMEMORY
agentone.channels.c1.memoryCapacity = 10000
agentone.channels.c1.overflowCapacity = 1000000
agentone.channels.c1.byteCapacity = 800000
agentone.channels.c1.checkpointDir = /mnt/flume/checkpoint
agentone.channels.c1.dataDirs = /mnt/flume/data
6. Pseudo Transaction Channel
The Pseudo Transaction Channel is used for only unit testing and is NOT meant for production use.
7. Custom Channel
A custom channel is a user’s implementation of the Channel interface. While starting the Flume agent, the custom channel’s class and its dependencies must be included. The type of the custom channel is its FQCN.
Example for Custom Channel.
agentone.channels = c1
agentone.channels.c1.type = org.example.MyChannel