Spark SQL provides built-in standard Date and Time Functions defines in DataFrame API, these come in handy when we need to make operations on data and time. All these accept input as, Date, Timestamp or String. If a String, the data must be in a format that can be cast to date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS and return date, int, or null if the input data was a string that could not be cast to date.

When possible try to leverage standard library as they are little bit more compile-time safety, handles null and performs better when compared to UDF’s. If your application is critical on performance try to avoid using custom UDF at all costs as these are not guarantee on performance.

For the readable purpose, I’ve grouped into the following.

Click on each link from below table for more explanation and working examples in Scala.

We will see how to get the current date and convert date into a specific date format using date_format() with Scala example. Below example parses the date and converts from ‘yyyy-dd-mm’ to ‘MM-dd-yyyy’ format.

Seq(("2019-01-23")) .toDF("Input") .select( current_date()as("current_date"), col("Input"), date_format(col("Input"), "MM-dd-yyyy").as("format") ).show()

+------------+----------+-----------+ |current_date| Input |format | +------------+----------+-----------+ | 2019-07-23 |2019-01-23| 01-23-2019 | +------------+----------+-----------+

Below example converts string in date format ‘MM/dd/yyyy’ to a DateType ‘yyyy-MM-dd’ using to_date() with Scala example.

Seq(("04/13/2019")) .toDF("Input") .select( col("Input"), to_date(col("Input"), "MM/dd/yyyy").as("to_date") ).show()

+----------+----------+ |Input |to_date | +----------+----------+ |04/13/2019|2019-04-13| +----------+----------+

Below example returns the difference between two dates using datediff() with Scala example.

Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")) .toDF("input") .select( col("input"), current_date(), datediff(current_date(),col("input")).as("diff") ).show()

+----------+--------------+--------+ | input |current_date()| diff | +----------+--------------+--------+ |2019-01-23| 2019-07-23 | 181 | |2019-06-24| 2019-07-23 | 29 | |2019-09-20| 2019-07-23 | -59 | +----------+--------------+--------+

months_between :

Below example returns the months between two dates using months_between() with Scala language.

Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")) .toDF("date") .select( col("date"), current_date(), datediff(current_date(),col("date")).as("datediff"), months_between(current_date(),col("date")).as("months_between") ).show()

+----------+--------------+--------+--------------+ | date |current_date()|datediff|months_between| +----------+--------------+--------+--------------+ |2019-01-23| 2019-07-23 | 181| 6.0| |2019-06-24| 2019-07-23 | 29| 0.96774194| |2019-09-20| 2019-07-23 | -59| -1.90322581| +----------+--------------+--------+--------------+

trunc:

Below example truncates date at a specified unit using trunc() with Scala language.

Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")) .toDF("input") .select( col("input"), trunc(col("input"),"Month").as("Month_Trunc"), trunc(col("input"),"Year").as("Month_Year"), trunc(col("input"),"Month").as("Month_Trunc") ).show()

+----------+-----------+----------+-----------+ | input |Month_Trunc|Month_Year|Month_Trunc| +----------+-----------+----------+-----------+ |2019-01-23| 2019-01-01|2019-01-01| 2019-01-01| |2019-06-24| 2019-06-01|2019-01-01| 2019-06-01| |2019-09-20| 2019-09-01|2019-01-01| 2019-09-01| +----------+-----------+----------+-----------+

add_months , date_add, date_sub

Here we are adding and subtracting date and month from a given input.

Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("input") .select( col("input"), add_months(col("input"),3).as("add_months"), add_months(col("input"),-3).as("sub_months"), date_add(col("input"),4).as("date_add"), date_sub(col("input"),4).as("date_sub") ).show()

+----------+----------+----------+----------+----------+ | input |add_months|sub_months| date_add | date_sub | +----------+----------+----------+----------+----------+ |2019-01-23|2019-04-23|2018-10-23|2019-01-27|2019-01-19| |2019-06-24|2019-09-24|2019-03-24|2019-06-28|2019-06-20| |2019-09-20|2019-12-20|2019-06-20|2019-09-24|2019-09-16| +----------+----------+----------+----------+----------+

year, month, month, dayofweek, dayofmonth, dayofyear, next_day, weekofyear

Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")) .toDF("input") .select( col("input"), year(col("input")).as("year"), month(col("input")).as("month"), dayofweek(col("input")).as("dayofweek"), dayofmonth(col("input")).as("dayofmonth"), dayofyear(col("input")).as("dayofyear"), next_day(col("input"),"Sunday").as("next_day"), weekofyear(col("input")).as("weekofyear") ).show()

+----------+----+-----+---------+----------+---------+----------+----------+ | input|year|month|dayofweek|dayofmonth|dayofyear| next_day|weekofyear| +----------+----+-----+---------+----------+---------+----------+----------+ |2019-01-23|2019| 1| 4| 23| 23|2019-01-27| 4| |2019-06-24|2019| 6| 2| 24| 175|2019-06-30| 26| |2019-09-20|2019| 9| 6| 20| 263|2019-09-22| 38| +----------+----+-----+---------+----------+---------+----------+----------+

Spark SQL Time Functions:

Date & Time Window Function Syntax Date & Time Window Function Description window(timeColumn: Column, windowDuration: String,

slideDuration: String, startTime: String): Column Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. window(timeColumn: Column, windowDuration: String, slideDuration: String): Column Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC window(timeColumn: Column, windowDuration: String): Column Generates tumbling time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC.

Conclusion:

In this post, I’ve consolidated the complete list of Spark SQL Date and Time Functions with a description and example of some commonly used. You can find more information about these at the following blog

Happy Learning !!