Setup Spark Standalone Master and Slave

July 1, 2018

Hello!

This article explains how to set-up spark standalone master and slave from command-line.

Download Apache Spark binaries from Mirror site. Official Documentation and downloads available at https://spark.apache.org/downloads.html

I have downloaded Spark and decompressed the file in Downloads folder. Now open the command prompt / terminal and let’s dive deep.

Set up the Environment Variable by opening Your bash_profile or bashrc (wherever you have saved your environment variables)

Update the .bash_profile with these lines, and save with command+x and source the bash_profile.

Steps are as follows :

nano .bash_profile

//edit the paths

save and exit with command+x

source .bash_profile

Paths are given here

export SPARK_HOME=~/Downloads/spark-2.3.1-bin-hadoop2.7

export PATH=$SPARK_HOME/bin:$PATH

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Ho$

export PATH=$JAVA_HOME:$PATH

export PYSPARK_DRIVER_PYTHON=jupyter

export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

Navigate to spark directory where you have decompressed the downloaded tgz file.

Rajivs-Air:spark-2.3.1-bin-hadoop2.7 rjrajivjha$ ./sbin/start-master.sh

You can now look at localhost:8080 to check the spark-master. There you will find the Spark URL as well. Use the same spark url to start the slave.

Rajivs-Air:spark-2.3.1-bin-hadoop2.7 rjrajivjha$ ./sbin/start-slave.sh spark://Rajivs-Air:7077

Rajivs-Air:spark-2.3.1-bin-hadoop2.7 rjrajivjha$ ./sbin/start-slave.sh spark://Rajivs-Air:7077 --cores 2 --memory 4g

Rajivs-Air:spark-2.3.1-bin-hadoop2.7 rjrajivjha$ ./sbin/stop-slave.sh

Rajivs-Air:spark-2.3.1-bin-hadoop2.7 rjrajivjha$ ./sbin/stop-all.sh

Keep tuned for more blogs from my Spark and ML series.

Happy Learning!

Rajiv Jha :)

My name is Rajiv Jha. I am Senior Engineering student at Guru Gobind Singh Indraprastha University.