Working with CSV data in Apache Spark

In this video we will understand how to work with CSV data in Apache Spark

  • Spark Version – 2.4
  • Language – Scala

Objectives

  • What is CSV file format
  • Reading CSV data – without header
  • Reading CSV data – provide column names
  • Reading CSV data – with header
  • Reading CSV data – Infer schema
  • Reading CSV data – Explicit schema
  • Writing CSV data to HDFS
  • How to apply data compression

Downloading the practice dataset

In this lecture we are using RETAIL DB database. You can download the practice dataset from our Github repository.

YouTube player

Want to learn how we can work with different file formats ( parquet, JSON, Avro, ORC )using Spark SQL module? What out our playlist on youtube . Don’t forget to Subscribe 🙂

https://www.youtube.com/playlist?list=PLxPiYXz4lGTO5YZIX05uvgOOCkR_brJ_P