In this tutorial, we will see Apache Hadoop Limitation, although Hadoop is a very powerful tool it has limitations too like it supports Batch processing only, it is not efficient for interactive processing, it can’t handle live data, etc.
1. Batch Processing
Apache Hadoop is a batch-processing engine, which processes data in batch mode. In batch, mode data is already stored on the system, and not real-time streaming cause Hadoop is not efficient in processing of real-time data.
2. Processing Overhead
When we deal with terabytes or Petabytes of data, it becomes overhead for Hadoop to read such huge data from disk and after processing write down on disk because Hadoop cannot process data in memory.
3. Small File Overhead
Apache Hadoop is used to store a small number of large files, but when it comes to storing a large number of small files(below 100 MB) then Hadoop fails because Hadoop store data in the block size of 128 MB or 256 MB by default and storing less than default size creates overhead for name node to process.
4. Security Concern
Apache Hadoop uses Kerberos for its authentication but missing encryption at storage and network layers are security concerns.
5. Caching Concern
Apache Hadoop is a batch-processing engine and MapReduce can’t cache the intermediate data in memory for further process and due to this Hadoop performance gets affected.
6. Not Easy to Use
Mapreduce programming has no interactive mode hence a developer has to write code for each operation.
7. Multiple Copies of Already Big Data
Hadoop HDFS file system was developed without the notion of performance and hence multiple copies of data are stored in it. HDFS stores three copies of the data set at the minimum. In some cases, we can see there are six copies of data required for data locality that creates an issue.
8. Inefficient Execution
Hadoop HDFS doesn't have an assumption on query optimization and because of this, it is difficult to choose an efficient ost-based plan for execution. Due to this reason, an Apache Hadoop cluster is large compared to a database.
9. Challenging Framework
The framework of Map-Reduce is particularly challenging to leverage the simple transformation logic. Some open-source tool is out there to make it simple but they are restrictive to language.
10. Required Skills
The intriguing data mining libraries which are part of the Hadoop project – Mahout – are inconsistently implemented, and in any event, require both pieces of knowledge of algorithms themselves as well as the skills for distributed MapReduce development.