Installing Apache Spark on Windows

  • Post category:Spark
  • Reading time:5 mins read

Hello Everyone. In this post I will tell you how to install Apache Spark on windows machine. By the end of this tutorial you’ll be able to use Spark with Scala on windows. Happy learning 🙂

STEPS:

Install java 8 on your machine.

Open the command prompt and type java -version to check if Java is installed on your machine.

verify java is installed on windows machine

Download Apache Spark distribution

After installing Java 8 in step 1, download Spark from https://spark.apache.org/downloads.html and choose “Pre-built for Apache Hadoop 2.7 and later” as mentioned in below picture.

Download spark distribution

After downloading the spark , unpack the distribution in a directory.

unpacking spark on windows

Set the environment variables. 

Set environment variables on your machine after step 1 and 2 are done. Steps are mentioned below.

Location: 

Control  Panel > System and Security > System > Advanced System Settings ( Require Admin privileges )

Control Panel > User Accounts > User Accounts > Change my Environment Variables

A. Set SPARK_HOME in Environment variables.

B. Add D:\spark-2.2.3-bin-hadoop2.7\bin to PATH variable

Add a dummy Hadoop Installation.

Even though we are not using Hadoop but Spark will throw below exception on starting spark-shell.

Reason: Spark expects winutils.exe in the Hadoop installation “<Hadoop Installation Directory>\bin\winutils.exe” so this can be fixed by adding a dummy Hadoop installation.

  1. First download the WINUTILS.EXE binary from a Hadoop redistribution. There is a repository of this for some Hadoop versions on github. Place it in C:\Hadoop\bin\
  2. And then set the environment variable %HADOOP_HOME% to point to the directory C:\Hadoop\bin\
Setting HADOOP_HOME path in Environment Variables

Setting file permission for /tmp/hive

Now Lets execute the first example and try to read a file in spark-shell.

Spark shell file permission issue
Now start the Spark Shell.

spark-shell script is located in bin directory of Spark distribution. On starting the spark shell , the below screen can be seen.

spark-shell prompt