Page Contents
Hello Everyone. In this post I will tell you how to install Apache Spark on windows machine. By the end of this tutorial you’ll be able to use Spark with Scala on windows. Happy learning 🙂
STEPS:
Install java 8 on your machine.
Open the command prompt and type java -version to check if Java is installed on your machine.
Download Apache Spark distribution
After installing Java 8 in step 1, download Spark from https://spark.apache.org/downloads.html and choose “Pre-built for Apache Hadoop 2.7 and later” as mentioned in below picture.
After downloading the spark , unpack the distribution in a directory.
Set the environment variables.
Set environment variables on your machine after step 1 and 2 are done. Steps are mentioned below.
Location:
Control Panel > System and Security > System > Advanced System Settings ( Require Admin privileges )
Control Panel > User Accounts > User Accounts > Change my Environment Variables
A. Set SPARK_HOME in Environment variables.
B. Add D:\spark-2.2.3-bin-hadoop2.7\bin to PATH variable
Add a dummy Hadoop Installation.
Even though we are not using Hadoop but Spark will throw below exception on starting spark-shell.
Reason: Spark expects winutils.exe in the Hadoop installation “<Hadoop Installation Directory>\bin\winutils.exe” so this can be fixed by adding a dummy Hadoop installation.
- First download the WINUTILS.EXE binary from a Hadoop redistribution. There is a repository of this for some Hadoop versions on github. Place it in C:\Hadoop\bin\
- And then set the environment variable %HADOOP_HOME% to point to the directory C:\Hadoop\bin\
Setting file permission for /tmp/hive
Now Lets execute the first example and try to read a file in spark-shell.
Now start the Spark Shell.
spark-shell script is located in bin directory of Spark distribution. On starting the spark shell , the below screen can be seen.