Introducing .NET for Apache® Spark™ Preview

April 24th, 2019

Today at Spark + AI summit we are excited to announce .NET for Apache Spark. Spark is a popular open source distributed processing engine for analytics over large data sets. Spark can be used for processing batches of data, real-time streams, machine learning, and ad-hoc query.

.NET for Apache Spark is aimed at making Apache® Spark™ accessible to .NET developers across all Spark APIs. So far Spark has been accessible through Scala, Java, Python and R but not .NET.

We plan to develop .NET for Apache Spark in the open (as a .NET Foundation member project) along with the Spark and .NET community to ensure that developers get the best of both worlds.

https://github.com/dotnet/spark Star

The remainder of this post provides more specifics on the following topics:

What is .NET for Apache Spark?

.NET for Apache Spark provides high performance APIs for using Spark from C# and F#. With this .NET APIs, you can access all aspects of Apache Spark including Spark SQL, DataFrames, Streaming, MLLib etc. .NET for Apache Spark lets you reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.

The C#/ F# language binding to Spark will be written on a new Spark interop layer which offers easier extensibility. This new layer of Spark interop was written keeping in mind best practices for language extension and optimizes for interop and performance. Long term this extensibility can be used for adding support for other languages in Spark.

You can learn more details about this work through this proposal.

.NET for Apache Spark is compliant with .NET Standard 2.0 and can be used on Linux, macOS, and Windows, just like the rest of .NET. .NET for Apache Spark is available by default in Azure HDInsight, and can be installed in Azure Databricks and more.

Getting Started with .NET for Apache Spark