Transformation Archives

Apache Spark RDD reduceByKey transformation

reduceByKey(func) converts a dataset of (K, V) pairs, into a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function.

Apache Spark RDD mapPartitions transformation

Apache Spark RDD mapPartitions and mapPartitionsWithIndex

mapPartitions and mapPartitionsWithIndex performs a map operation on an entire partition and returns a new RDD.

Apache Spark RDD groupBy transformation

As per Apache Spark documentation, groupBy returns an RDD of grouped items where each group consists of a key and a sequence of elements.

Apache Spark RDD groupByKey transformation

groupByKey([numPartitions]) is called on a dataset of (K, V) pairs, and returns a dataset of (K, Iterable) pairs.

Apache Spark RDD’s filter transformation

Apache Spark RDD filter transformation

As per Apache Spark, filter(function) returns a new dataset formed by selecting those elements of the source on which function returns true.

Apache Spark RDD’s flatMap transformation

flatMap(func) is similar to map, but each input item can be mapped to 0 or more output items The func should return a scala.collection.Seq