In this tutorial we’ll learn how to install Apache Spark on Ubuntu 16.04. We will also install and configure its prerequisites. Apache Spark is a flexible and fast solution for large scale data processing. It is an open source distributed engine suitable for large scale data processing. Apache spark was founded by an Apache Software Foundation. It can be run on HBase, Hadoop, Cassandra, Hive, Apache Mesos, Amazon EC2 Cloud, HDFS, etc. It can run using its standalone cluster mode as well as on various cloud platforms.
I recommend to use a minimal Ubuntu server setup as a basis for the tutorial, that can be a virtual or a root server image with an Ubuntu 16.04 minimal install from a web hosting company or you use our minimal server tutorial to install a server from scratch.
Install Apache Spark on Ubuntu 16.04
Step 1. First, ensure your system and apt package lists are fully up-to-date by running the following:
apt-get update -y apt-get upgrade -y
Step 2. Installing Java.
As Spark is based on Java, we need to install it on our machine:
apt-get -y install openjdk-8-jdk-headless
Step 3. Installing Apache Spark on Ubuntu 16.04.
First, Download latest Apache Spark release from here:
Extract the Apache Spark Tarball:
tar xvzf spark-2.3.0-bin-hadoop2.7.tgz
Run this command to make a symbolic link:
ln -s spark-2.3.0-bin-hadoop2.7 spark
Next, Adding Spark to Path:
Add these lines to the end of the .bashrc file so that path can contain the Spark executable file path:
SPARK_HOME=/LinuxHint/spark export PATH=$SPARK_HOME/bin:$PATH
To activate these changes, run the following command:
Then, verify the installation, close the Terminal already opened, and Open Terminal again. Run the following command:
We can see in the console that Spark has also opened a Web Console on port 4040. Let’s give it a visit:
Congratulation’s! You have successfully install and configure Apache Spark on your Ubuntu 16.04 server. Thanks for using this tutorial for installing Apache Spark on Ubuntu system.