Spark RDD Transformations Archives

Apache Spark RDD reduceByKey transformation

reduceByKey(func) converts a dataset of (K, V) pairs, into a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function.

Apache Spark RDD mapPartitions transformation

Apache Spark RDD mapPartitions and mapPartitionsWithIndex

mapPartitions and mapPartitionsWithIndex performs a map operation on an entire partition and returns a new RDD.

Apache Spark RDD groupBy transformation

As per Apache Spark documentation, groupBy returns an RDD of grouped items where each group consists of a key and a sequence of elements.

Apache Spark RDD groupByKey transformation

groupByKey([numPartitions]) is called on a dataset of (K, V) pairs, and returns a dataset of (K, Iterable) pairs.

Apache Spark RDD’s filter transformation

Apache Spark RDD filter transformation

As per Apache Spark, filter(function) returns a new dataset formed by selecting those elements of the source on which function returns true.