Manipulating String columns in Dataframe
In this video we will understand how to manipulate the String columns in Dataframe. For the demo we are using Spark 2.4 version and scala language.
In this video we will understand how to manipulate the String columns in Dataframe. For the demo we are using Spark 2.4 version and scala language.
In this video we will understand how to work with AVRO data in Apache Spark.For the demo we are using Spark 2.4 version and scala language.
In this video we will understand how to work with CSV data in Apache Spark. For the demo we are using Spark 2.4 version and scala language.
In this video we will understand how to work with DataFrame Columns in Apache Spark.
In this lecture we will learn how to work with Hive Metastore in Apache Spark. We will be reading table from Hive metasotre in spark and will also be creating a table using saveAsTable API.
In this video we will learn how to work with JSON data in Apache Spark.
In this lecture we will learn how to work with Parquet File Format in Spark.
Manipulating Dates in Dataframe using Spark API using from_unixtime(), unix_timestamp(), to_date(), hour(), minute() and second() function.
In this video we will understand DataFrame abstraction in Spark.
ELK stands for Elasticsearch, Logstash, and Kibana. These are three components of the ELK stack that are used to index, collect and visualize the data.
Apache Spark is an open-source cluster computing framework which is 100 times faster in memory and 10 times faster on disk when compared to Apache Hadoop.
How to setup Spark 2.4 cluster on Google Cloud using Dataproc. Step1 - Create a new project , Step2 - Create a new Cluster using Dataproc.
SBT is an open-source build tool for Scala and Java projects, similar to Java’s Maven and Ant.
By enabling compression in Hive, we can significantly save the required storage space and also increase the throughput and performance.
I’m going to walk you through some important HDFS shell commands which can be used to manage files present in Hadoop distributed file system. These command are also important if you are planning to take CCA-175 certification exam.