Installing Apache Spark on Windows

Page Contents

Hello Everyone. In this post I will tell you how to install Apache Spark on windows machine. By the end of this tutorial you’ll be able to use Spark with Scala on windows. Happy learning 🙂

STEPS:

Install java 8 on your machine.

Open the command prompt and type java -version to check if Java is installed on your machine.

verify java is installed on windows machine

Download Apache Spark distribution

After installing Java 8 in step 1, download Spark from https://spark.apache.org/downloads.html and choose “Pre-built for Apache Hadoop 2.7 and later” as mentioned in below picture.

After downloading the spark , unpack the distribution in a directory.

Set the environment variables.

Set environment variables on your machine after step 1 and 2 are done. Steps are mentioned below.

Location:

Control Panel > System and Security > System > Advanced System Settings ( Require Admin privileges )

Control Panel > User Accounts > User Accounts > Change my Environment Variables

A. Set SPARK_HOME in Environment variables.

B. Add D:\spark-2.2.3-bin-hadoop2.7\bin to PATH variable

Add a dummy Hadoop Installation.

Even though we are not using Hadoop but Spark will throw below exception on starting spark-shell.

Reason: Spark expects winutils.exe in the Hadoop installation “<Hadoop Installation Directory>\bin\winutils.exe” so this can be fixed by adding a dummy Hadoop installation.

First download the WINUTILS.EXE binary from a Hadoop redistribution. There is a repository of this for some Hadoop versions on github. Place it in C:\Hadoop\bin\
And then set the environment variable %HADOOP_HOME% to point to the directory C:\Hadoop\bin\