Top 10 Apache Hadoop Limitations

In this tutorial, we will see Apache Hadoop Limitation, although Hadoop is a very powerful tool it has limitations too like it supports Batch processing only, it is not efficient for interactive processing, it can’t handle live data, etc.

1. Batch Processing

Apache Hadoop is a batch-processing engine, which processes data in batch mode. In batch, mode data is already stored on the system, and not real-time streaming cause Hadoop is not efficient in processing of real-time data.

2. Processing Overhead

When we deal with terabytes or Petabytes of data, it becomes overhead for Hadoop to read such huge data from disk and after processing write down on disk because Hadoop cannot process data in memory.

3. Small File Overhead

Apache Hadoop is used to store a small number of large files, but when it comes to storing a large number of small files(below 100 MB) then Hadoop fails because Hadoop store data in the block size of 128 MB or 256 MB by default and storing less than default size creates overhead for name node to process.

4. Security Concern

Apache Hadoop uses Kerberos for its authentication but missing encryption at storage and network layers are security concerns.

5. Caching Concern

Apache Hadoop is a batch-processing engine and MapReduce can’t cache the intermediate data in memory for further process and due to this Hadoop performance gets affected.

6. Not Easy to Use

Mapreduce programming has no interactive mode hence a developer has to write code for each operation.

7. Multiple Copies of Already Big Data

Hadoop HDFS file system was developed without the notion of performance and hence multiple copies of data are stored in it. HDFS stores three copies of the data set at the minimum. In some cases, we can see there are six copies of data required for data locality that creates an issue.

8. Inefficient Execution

Hadoop HDFS doesn't have an assumption on query optimization and because of this, it is difficult to choose an efficient ost-based plan for execution. Due to this reason, an Apache Hadoop cluster is large compared to a database.

9. Challenging Framework

The framework of Map-Reduce is particularly challenging to leverage the simple transformation logic. Some open-source tool is out there to make it simple but they are restrictive to language.

10. Required Skills

The intriguing data mining libraries which are part of the Hadoop project – Mahout – are inconsistently implemented, and in any event, require both pieces of knowledge of algorithms themselves as well as the skills for distributed MapReduce development.