The objective of this tutorial is to describe step by step process to install Spark 2.4.5 (Version spark-2.4.5-bin-hadoop2.7) on Ubuntu 18.04.4 LTS (Bionic Beaver), once the installation is completed you can play with Spark.
Platform
- Operating System (OS). You can use Ubuntu 18.04.4 LTS version or later version, also you can use other flavors of Linux systems like Redhat, CentOS, etc.
- Spark. We have used Spark 2.4.5 (Version spark-2.4.5-bin-hadoop2.7).
Download Software
- VMWare Player for Windows
- Ubuntu
- Spark
- Eclipse for windows
https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/7_0
http://releases.ubuntu.com/18.04.4/ubuntu-18.04.4-desktop-amd64
https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
https://www.eclipse.org/downloads/
For VMware and Ubuntu installation please refer to “Hadoop Installation on Single Node” in the Hadoop section.
Click Here To Download - spark-2.4.5-bin-hadoop2.7.tar (264.2 MB)Steps to Install Spark 2.4.5 on Ubuntu 18.04.4 LTS
Step 1. Please download Spark 2.4.5 from the below link.
On Windows: https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
On Linux: $wget https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
Step 2. Install Java 8 using the below command.
$sudo apt-get install openjdk-8-jdk
data:image/s3,"s3://crabby-images/c37ac/c37acc1350de63e1eb92083bd725c25b0e45ae15" alt="java installation1"
data:image/s3,"s3://crabby-images/f1a87/f1a87688f5b6959bb1ded168ff2b21136ebbdb99" alt="java installation2"
Press Y to continue the installation.
data:image/s3,"s3://crabby-images/27100/27100634bc2559207c295c6971d34f31ad065eef" alt="java installation3"
Once the java installation is completed please verify it by running the below command.
$java –version
data:image/s3,"s3://crabby-images/2ace9/2ace94f4fd3615b841e70541b947249ed05a6ece" alt="java installation4"
Step 3. Now install Apache Spark. Download it from cmd using the below command.
$wget https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
data:image/s3,"s3://crabby-images/f670f/f670f6dddff014dce9ff4828c6a0a8b1bbe63e41" alt="spark download1"
In our case, it is present at the below location.
/home/cloudduggu/spark-2.4.5-bin-hadoop2.7.tgz
data:image/s3,"s3://crabby-images/2bfa7/2bfa7967902e7ca59492800b3c2a0d75e620b496" alt="spark download2"
Step 4. Now extract the tar file by using the below command and rename the folder to spark to make it meaningful.
$tar xzf spark-2.4.5-bin-hadoop2.7.tgz
data:image/s3,"s3://crabby-images/d86e2/d86e2014249159952f69c1cda8c1b3c3f7a54df6" alt="spark untar1"
$mv spark-2.4.5-bin-hadoop2.7 spark
data:image/s3,"s3://crabby-images/b1f47/b1f4783a40d17aa45054e6c319a4975dc00d2705" alt="spark move"
Step 5. Now edit .bashrc file using nano editor and export JAVA home and Spark home. In our case below is the location please verify yours.
export SPARK_HOME=/home/cloudduggu/spark/
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-i386/
data:image/s3,"s3://crabby-images/d697d/d697d1c3da5e3120e3e773cc8bfdf0d17be542e4" alt="bashrc edit1"
data:image/s3,"s3://crabby-images/b8a9e/b8a9e2c3a3cad62088f9415f2af8f99afd0a2d7e" alt="bashrc edit2"
Now save the changes by pressing CTRL + O and exit from nano editor by pressing CTRL + X.
Step 6. So now Spark installation is completed. Let us start the Spark shell by using the below command. Run it from spark home.
$ /home/cloudduggu/spark/bin/spark-shell
data:image/s3,"s3://crabby-images/75167/751676a1fc7fd24a9d4cf9838c7155e0447d4307" alt="spark shell1"
data:image/s3,"s3://crabby-images/dc3c2/dc3c271996be14eb4f75471f33f8e55f5dba02f3" alt="spark_shell2"