The objective of this tutorial is to describe step by step process to install Hadoop on a cluster of nodes. We have explained this example by using one master node and four slave nodes.
Platform
- Operating System (OS). You can use Ubuntu 18.04.4 LTS version or later version, also you can use other flavors of Linux systems like Redhat, CentOS, etc.
- Hadoop. We have used Apache Hadoop 3.1.2 version you can use Cloudera distribution or other distribution as well.
Download Software
- VMWare Player for Windows
- Ubuntu
- Eclipse for windows
- Putty for windows
- Winscp for windows
- Hadoop
https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/7_0
http://releases.ubuntu.com/18.04.4/ubuntu-18.04.4-desktop-amd64
https://www.eclipse.org/downloads/
http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
http://winscp.net/eng/download.php
https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz
We have the below cluster of nodes on which we will install Hadoop 3.1.2. We have one master and four slaves and their details are as mentioned below.
Installation of Hadoop on Master Node
Let’s install Hadoop on the master node that is (HadoopMasternode: 185.150.1.20).
Step 1. Please edit the hosts file of the Master node and add the below entries.
cloudduggu@ubuntu:-$sudo nano /etc/hosts
Step 2. Verify if Java is installed on the Master node by using Java –version command. If it is installed you will receive the below message.
Otherwise, you can install JAVA 8 from the below link.
cloudduggu@ubuntu:-$sudo apt-get install openjdk-8-jdk
Step 3. Once Java is installed, update the source list of files using the below command.
cloudduggu@ubuntu:-$sudo apt-get update
Step 4. Now configure SSH on the master node using the below command.
cloudduggu@ubuntu:-$sudo apt-get install openssh-server openssh-client
Step 5. Once SSH is installed, now generate key pair for passwordless SSH from master to slaves.
cloudduggu@ubuntu:-$ssh-keygen -t rsa -P ""
Step 6. Now copy the content of .ssh/id_rsa.pub from the master node to all slaves in .ssh/authorized_keys.
Step 7. Once the key is copied verify the login from the master node for all slave nodes.
Till now we can connect the slave machine from the master node by just supplying ssh and slave node name.
Step 8. Now we are ready to install Hadoop on the master node. Download the software from the below link.
https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz
In our case, it is present at the below location.
/home/cloudduggu/hadoop-3.1.2.tar.gz
Step 9. Now let us untar the file.
cloudduggu@ubuntu:-$tar xzf hadoop-3.1.2.tar.gz
Hadoop Configuration Files Setup
Step 10. Open the .bashrc file from the user’s home and add the below parameters to update the location of Hadoop.
$nano .bashrc
Step 11. Now we will set up a java home in the Hadoop-env. sh file.
Hadoop-env.sh file location:/home/cloudduggu/hadoop/etc/hadoop/
JAVA file location: /usr/lib/jvm/java-8-openjdk-i386/
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-i386/
Step 12. Now open the core-site.xml file which is located under ("/hadoop/etc/hadoop") and add the below parameter.
cloudduggu@ubuntu:~/hadoop/etc/hadoop/nano core-site.xml
Step 13. Open hdfs-site.xml file which is located under("/hadoop/etc/hadoop") and add the below parameters.
cloudduggu@ubuntu:~/hadoop/etc/hadoop/nano hdfs-site.xml
Step 14. Open mapred-site.xml file which is located under ("/hadoop/etc/hadoop") and add the below parameters.
cloudduggu@ubuntu:~/hadoop/etc/hadoop/nano mapred-site.xml
Step 15. Open yarn-site.xml file which is located under ("/hadoop/etc/hadoop") and add the below parameters.
cloudduggu@ubuntu:~/hadoop/etc/hadoop/nano yarn-site.xml
Step 16. Now configure Master and slave nodes. ("/hadoop/etc/hadoop")
Configuring Master Node
Configuring Slave Node
Now we have set up Hadoop successfully on Master Node, Let’s configure slave nodes.
Step 17. Now configure Master and slave nodes. ("/hadoop/etc/hadoop")
Step 18. Verify if Java is installed on all slave nodes by using Java –version command. If it is installed you will receive the below message.
Otherwise, you can install JAVA 8 from the below link.
$sudo apt-get install openjdk-8-jdk
Step 19. Now copy the configuration file from the Master node to all Slave nodes using the below commands.
Step 20. Now untar that file on all slave nodes.
Step 21. Installation is completed on all Slave nodes, now format hdfs on master node using below command. (perform this activity only once otherwise it will erase all data from hdfs).
$bin/hdfs namenode -format
Step 22. Start HDFS and YARN from the master node using the below commands.
To start HDFS services run the below command.
$sbin/start-dfs.sh
To start YARN services run the below command.
$sbin/start-yarn.sh
Step 23. Verify services are running from master and slave nodes.
On the master node run the below command.
On the slave nodes, run the below command.
So now we have completed Hadoop installation on multinode clusters.