map Creates a new map column. map_keys Returns an array containing the keys of the map. map_values Returns an array containing the values of the map. map_concat Merges maps specified in arguments. map_from_entries Returns a map from the given array of StructType entries. map_entries Returns an array of all StructType in the given map. explode(e: Column) Creates a new row for every key-value pair in the map by ignoring null & empty. It creates two new columns one for key and one for value. explode_outer(e: Column) Creates a new row for every key-value pair in the map including null & empty. It creates two new columns one for key and one for value. posexplode(e: Column) Creates a new row for each key-value pair in a map by ignoring null & empty. It also creates 3 columns “pos” to hold the position of the map element, “key” and “value” columns for every row. posexplode_outer(e: Column) Creates a new row for each key-value pair in a map including null & empty. It also creates 3 columns “pos” to hold the position of the map element, “key” and “value” columns for every row. transform_keys(expr: Column, f: (Column, Column) => Column) Transforms map by applying functions to every key-value pair and returns a transformed map. transform_values(expr: Column, f: (Column, Column) => Column) Transforms map by applying functions to every key-value pair and returns a transformed map. map_zip_with( left: Column, right: Column, f: (Column, Column, Column) => Column) Merges two maps into a single map. element_at(column: Column, value: Any) Returns a value of a key in a map. size(e: Column) Returns length of a map column.

val structureData = Seq( Row("36636","Finance",Row(3000,"USA")), Row("40288","Finance",Row(5000,"IND")), Row("42114","Sales",Row(3900,"USA")), Row("39192","Marketing",Row(2500,"CAN")), Row("34534","Sales",Row(6500,"USA")) ) val structureSchema = new StructType() .add("id",StringType) .add("dept",StringType) .add("properties",new StructType() .add("salary",IntegerType) .add("location",StringType) ) var df = spark.createDataFrame( spark.sparkContext.parallelize(structureData),structureSchema) df.printSchema() df.show(false)

root |-- id: string (nullable = true) |-- dept: string (nullable = true) |-- properties: struct (nullable = true) | |-- salary: integer (nullable = true) | |-- location: string (nullable = true) +-----+---------+-----------+ |id |dept |properties | +-----+---------+-----------+ |36636|Finance |[3000, USA]| |40288|Finance |[5000, IND]| |42114|Sales |[3900, USA]| |39192|Marketing|[2500, CAN]| |34534|Sales |[6500, USA]| +-----+---------+-----------+

map() – Spark SQL map functions

Syntax - map(cols: Column*): Column

org

apache

spark

sql.functions.

val index = df.schema.fieldIndex("properties") val propSchema = df.schema(index).dataType.asInstanceOf[StructType] var columns = mutable.LinkedHashSet[Column]() propSchema.fields.foreach(field =>{ columns.add(lit(field.name)) columns.add(col("properties." + field.name)) }) df = df.withColumn("propertiesMap",map(columns.toSeq:_*)) df = df.drop("properties") df.printSchema() df.show(false)

root |-- id: string (nullable = true) |-- dept: string (nullable = true) |-- propertiesMap: map (nullable = false) | |-- key: string | |-- value: string (valueContainsNull = true) +-----+---------+---------------------------------+ |id |dept |propertiesMap | +-----+---------+---------------------------------+ |36636|Finance |[salary -> 3000, location -> USA]| |40288|Finance |[salary -> 5000, location -> IND]| |42114|Sales |[salary -> 3900, location -> USA]| |39192|Marketing|[salary -> 2500, location -> CAN]| |34534|Sales |[salary -> 6500, location -> USA]| +-----+---------+---------------------------------+

map_keys() – Returns map keys from a Spark SQL DataFrame

Syntax - map_keys(e: Column): Column

df.select(col("id"),map_keys(col("propertiesMap"))).show(false)

+-----+-----------------------+ |id |map_keys(propertiesMap)| +-----+-----------------------+ |36636|[salary, location] | |40288|[salary, location] | |42114|[salary, location] | |39192|[salary, location] | |34534|[salary, location] | +-----+-----------------------+

map_values() – Returns map values from a Spark DataFrame

Syntax - map_values(e: Column): Column

df.select(col("id"),map_values(col("propertiesMap"))) .show(false)

+-----+-------------------------+ |id |map_values(propertiesMap)| +-----+-------------------------+ |36636|[3000, USA] | |40288|[5000, IND] | |42114|[3900, USA] | |39192|[2500, CAN] | |34534|[6500, USA] | +-----+-------------------------+

map_concat() – Concatenating two or more maps on DataFrame

Syntax - map_concat(cols: Column*): Column

val arrayStructureData = Seq( Row("James",List(Row("Newark","NY"),Row("Brooklyn","NY")),Map("hair"->"black","eye"->"brown"), Map("height"->"5.9")), Row("Michael",List(Row("SanJose","CA"),Row("Sandiago","CA")),Map("hair"->"brown","eye"->"black"),Map("height"->"6")), Row("Robert",List(Row("LasVegas","NV")),Map("hair"->"red","eye"->"gray"),Map("height"->"6.3")), Row("Maria",null,Map("hair"->"blond","eye"->"red"),Map("height"->"5.6")), Row("Jen",List(Row("LAX","CA"),Row("Orange","CA")),Map("white"->"black","eye"->"black"),Map("height"->"5.2")) ) val arrayStructureSchema = new StructType() .add("name",StringType) .add("addresses", ArrayType(new StructType() .add("city",StringType) .add("state",StringType))) .add("properties", MapType(StringType,StringType)) .add("secondProp", MapType(StringType,StringType)) val concatDF = spark.createDataFrame( spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema) concatDF.withColumn("mapConcat",map_concat(col("properties"),col("secondProp"))) .select("name","mapConcat") .show(false)

+-------+---------------------------------------------+ |name |mapConcat | +-------+---------------------------------------------+ |James |[hair -> black, eye -> brown, height -> 5.9] | |Michael|[hair -> brown, eye -> black, height -> 6] | |Robert |[hair -> red, eye -> gray, height -> 6.3] | |Maria |[hair -> blond, eye -> red, height -> 5.6] | |Jen |[white -> black, eye -> black, height -> 5.2]| +-------+---------------------------------------------+

map_from_entries() – convert array of StructType entries to map

Syntax - map_from_entries(e: Column): Column

concatDF.withColumn("mapFromEntries",map_from_entries(col("addresses"))) .select("name","mapFromEntries") .show(false)

+-------+-------------------------------+ |name |mapFromEntries | +-------+-------------------------------+ |James |[Newark -> NY, Brooklyn -> NY] | |Michael|[SanJose -> CA, Sandiago -> CA]| |Robert |[LasVegas -> NV] | |Maria |null | |Jen |[LAX -> CA, Orange -> CA] | +-------+-------------------------------+

map_entries() – convert map of StructType to array of StructType

Syntax - map_entries(e: Column): Column

Before we start, let’s create a DataFrame with some sample data to work with.Outputs below schemas and data.map() SQL function is used to create a map column of MapType on DataFrame. The input columns to the map function must be grouped as key-value pairs. e.g. (key1, value1, key2, value2, …).All key columns must have the same data type, and can’t be null and All value columns must have the same data type. Below snippet converts all columns from “properties” struct into map key, value pairs “propertiesmap” column.First, we find “properties” column on Spark DataFrame using df.schema.fieldIndex(“properties”) and retrieves all columns and it’s values to a LinkedHashSet. we need LinkedHashSet in order to maintain the insertion order of key and value pair. and finally use map() function with a key, value set pair.use map_keys() spark function in order to retrieve all keys from a Spark DataFrame MapType column. Note that map_keys takes an argument of MapType while passing any other type returns an error at run time.Outputs all map keys from a Spark DataFrameuse map_values() spark function in order to retrieve all values from a Spark DataFrame MapType column. Note that map_values takes an argument of MapType while passing any other type returns an error at run time.Outputs following.Use Spark SQL map_concat() function in order to concatenate keys and values from more than one map to a single map. All arguments to this function should be MapType , passing any other type results a run time error.Output:Use map_from_entries() SQL functions to convert array of StructType entries to map ( MapType ) on Spark DataFrame. This function take DataFrame column ArrayType[StructType] as an argument, passing any other type results an error.Output:Use Spark SQL map_entries() function to convert map of StructType to array of StructType column on DataFrame.