Working with JSON data in Apache Spark

In this video we will understand how to work with JSON data in Apache Spark.

  • Spark Version – 2.4
  • Language – Scala

Objectives

  • What is JSON file format
  • Reading JSON file – Single-line mode
  • Reading multiline JSON
  • Writing JSON to HDFS

Downloading the practice dataset

In this lecture we are using RETAIL DB database. You can download the practice dataset from our Github repository.

YouTube player

Want to learn how we can work with different file formats ( parquet, JSON, Avro, ORC )using Spark SQL module? What out our playlist on youtube . Don’t forget to Subscribe 🙂

https://www.youtube.com/playlist?list=PLxPiYXz4lGTO5YZIX05uvgOOCkR_brJ_P

Hungary for more ?

  • See our post to learn how to create a Spark cluster on Google Cloud Platform.
  • See our post to understand DataFrame abstraction in Apache Spark.
  • See our post to understand how to manipulate Date columns in Apache Spark.