Apache Flume sources are used to consume events that are delivered to them by an external source like a web server and the format in which the source system sends are identified by the Apache Flume source system.
The following are the list of Apache Flume source.
- Avro Source
- Thrift Source
- Exec Source
- JMS Source
- Spooling Directory Source
- Kafka Source
- NetCat TCP Source
- NetCat UDP Source
- Sequence Generator Source
- Syslog Source
- HTTP Source
- Custom Source
- Scribe Source
Let us see each Apache Flume source with the following definition.
1. Avro Source
Apache Flume Avro Source receives events data from external Avro client streams. Avro souces pair with the Avro sink and creates a tiered collection topology.
Let us see the configuration example of Avro Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = avro
agentone.sources.source.channels = channelone
agentone.sources.source.bind = 0.0.0.0
agentone.sources.source.port = 4141
2. Thrift Source
Apache Flume Thrift Source receives events from external Thrift client streams. When a Thrift source is paired with the built-in ThriftSink on another Flume agent then it creates tiered collection topologies. We can start the Thrift source by using Kerberos authentication.
Let us see the configuration example of Thrift Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = thrift
agentone.sources.source.channels = channelone
agentone.sources.source.bind = 0.0.0.0
agentone.sources.source.port = 4141
3. Exec Source
Apache Flume Exec source is used to run a UNIX command on start-up and continuously generate data on standard out. In case if the process is terminated then the source also terminated and doesn't generate data.
Let us see the configuration example of Exec Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = exec
agentone.sources.source.command = tail -F /var/log/secure
agentone.sources.source.channels = channelone
4. JMS Source
Apache Flume JMS Source reads messages from a JMS destination such as a queue or topic. As a JMS application, it should work with any JMS provider but has only been tested with ActiveMQ.
Need to note that the vendor provided JMS jars should be included in the Flume classpath using plugins.d directory (preferred), –classpath on the command line, or via FLUME_CLASSPATH variable in flume-env.sh.
Let us see the configuration example of JMS Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = jms
agentone.sources.source.channels = channelone
agentone.sources.source.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactory
agentone.sources.source.connectionFactory = GenericConnectionFactory
agentone.sources.source.providerURL = tcp://mqserver:61616
agentone.sources.source.destinationName = BUSINESS_DATA
agentone.sources.source.destinationType = QUEUE
5. Spooling Directory Source
Apache Flume Spooling Directory receives data into a “spooling” directory on disk. It keeps monitoring the directory for new data and process it.
Apache Flume Spooling Directory is a reliable source from which data does not miss even if the Flume is restarted or its process is killed.
Apache Flume will raise an error in the following conditions.
- If a file is written before putting it in a spool directory.
- If a file is already used and again the same file name is going to use.
We can add a timestamp to the log file name when it moves to the spool directory.
Let us see the configuration example of Spooling Directory Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = spooldir
agentone.sources.source.channels = channelone
agentone.sources.source.spoolDir = /var/log/apache/flumeSpool
agentone.sources.source.fileHeader = true
6. Kafka Source
Apache Flume Kafka Source reads messages from Kafka topics. We can configure multiple Kafka sources in the same Consumer Group so that each will read a unique set of partitions for the topics.
The following is an example of a comma-separated topic list.
agentone.sources.source.type = org.apache.flume.source.kafka.KafkaSource
agentone.sources.source.channels = channelone
agentone.sources.source.batchSize = 5000
agentone.sources.source.batchDurationMillis = 2000
agentone.sources.source.kafka.bootstrap.servers = localhost:9092
agentone.sources.source.kafka.topics = test1, test2
agentone.sources.source.kafka.consumer.group.id = custom.g.id
Example for topic subscription by regex.
agentone.sources.source.type = org.apache.flume.source.kafka.KafkaSource
agentone.sources.source.channels = channelone
agentone.sources.source.kafka.bootstrap.servers = localhost:9092
agentone.sources.source.kafka.topics.regex = ^topic[0-9]$
# the default kafka.consumer.group.id=flume is used
7. NetCat TCP Source
Apache Flume NetCat TCP Source receives data using a port after that it converts that data into events. The data comes in the form of the newline-separated text and forwarded using the channel.
Let us see the configuration example of NetCat TCP Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = netcat
agentone.sources.source.bind = 0.0.0.0
agentone.sources.source.port = 6666
agentone.sources.source.channels = channelone
8. NetCat UDP Source
Apache Flume NetCat UDP source acts as NetCat TCP that receives data on a given port and is forwarded using a channel.
Let us see an example of configuration detail for NetCat UDP Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = netcatudp
agentone.sources.source.bind = 0.0.0.0
agentone.sources.source.port = 6666
agentone.sources.source.channels = channelone
9. Sequence Generator Source
Apache Flume sequence generator is used for the testing purpose that generates the event continuously based on the counter. The counter always initiated with the value 0 and keeps increasing by 1. The counter stops only when it is reached to total events.
Let us see an example of configuration detail for Sequence Generator Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = seq
agentone.sources.source.channels = channelone
10. Syslog Source
Apache Flume Syslog Sources reads Syslog data and generate Flume events.
- The Syslog TCP sources create a new event for each string of characters separated by a newline (‘n’).
- The Multiport Syslog TCP Source supports Multi-port capability.
- The Syslog UDP source treats the complete message from the source as an event.
1. Syslog TCP Source
Below is the original, tried-and-true Syslog TCP source Property Name.
Let us see an example of configuration detail for Syslog TCP Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = syslogtcp
agentone.sources.source.port = 5140
agentone.sources.source.host = localhost
agentone.sources.source.channels = channelone
2. Multiport Syslog TCP Source
Multiport Syslog TCP Source is a newer, faster, multi-port capable version of the Syslog TCP source. It supports Multi-port capability which means it can listen on many ports at one time and to do so it uses the Apache Mina library.
It provides the capability to configure the character set used on a per-post basis.
Let us see an example of configuration detail for Multiport Syslog TCP Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = multiport_syslogtcp
agentone.sources.source.channels = channelone
agentone.sources.source.host = 0.0.0.0
agentone.sources.source.ports = 10001 10002 10003
agentone.sources.source.portHeader = port
3. Syslog UDP Source
Multiport Syslog TCP Source is a newer, faster, multi-port capable version of the Syslog TCP source. It supports Multi-port capability which means it can listen on many ports at one time and to do so it uses the Apache Mina library.
It provides the capability to configure the character set used on a per-post basis.
Let us see an example of configuration detail for Syslog UDP Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = syslogudp
agentone.sources.source.port = 5140
agentone.sources.source.host = localhost
agentone.sources.source.channels = channelone
11. HTTP Source
Apache Flume HTTP source gets data from the HTTP POST and GET. If there is an error thrown from the handler then an error status 400 is thrown and in case the channel is full then an HTTP 503 error is thrown.
Let us see an example of configuration detail for HTTP Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = http
agentone.sources.source.port = 5140
agentone.sources.source.channels = channelone
agentone.sources.source.handler = org.example.rest.RestHandler
agentone.sources.source.handler.nickname = random props
agentone.sources.source.HttpConfiguration.sendServerVersion = false
agentone.sources.source.ServerConnector.idleTimeout = 300
12. Custom Source
Apache Flume custom source is a user-managed source in which use includes a custom source and its dependencies.
Let us see an example of configuration detail for Custom Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = org.example.MySource
agentone.sources.source.channels = channelone
13. Scribe Source
Apache Flume Scribe source is another type of ingesting system. To adopt the existing Scribe ingesting system, Flume should use Scribe Source based on Thrift with the compatible transferring protocol.
Let us see an example of configuration detail for Scribe Source.
agentone.sources = source
agentone.channels = channelone
agentone.sources.source.type = org.apache.flume.source.scribe.ScribeSource
agentone.sources.source.port = 1463
agentone.sources.source.workerThreads = 5
agentone.sources.source.channels = channelone