MMLSpark can be conveniently installed on existing Spark clusters via the --packages option, examples:

spark-shell --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1

‍

pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1

This can be used in other Spark contexts too, for example, you can use MMLSpark in AZTK by adding it to the .aztk/spark-default.conf file.



Step 1: Create a Databricks account

If you already have a databricks account please skip to step 2. If not, you can make a free account on azure Step 2: Install MMLSpark com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 . Next, ensure this library is attached to your cluster (or all clusters). Finally, ensure that your Spark cluster has Spark 2.3 and Scala 2.11. You can use MMLSpark in both your Scala and PySpark notebooks.

To install MMLSpark on the Databricks cloud , create a new library from Maven coordinates in your workspace. For the coordinates use:. Next, ensure this library is attached to your cluster (or all clusters). Finally, ensure that your Spark cluster has Spark 2.3 and Scala 2.11. You can use MMLSpark in both your Scala and PySpark notebooks. Step 3: Load our Examples (Optional)

To load our examples, right click in your workspace, click "import" and use the following URL:

https://mmlspark.blob.core.windows.net/dbcs/MMLSpark%20Examples%20v1.0.0-rc1.dbc



The easiest way to evaluate MMLSpark is via our pre-built Docker container. To do so, run the following command:

docker run -it -p 8888:8888 mcr.microsoft.com/mmlspark/release



Please read our docker EULA for usage rights

To try out MMLSpark on a Python (or Conda) installation first install PySpark via pip with pip install pyspark . Next, use --packages or add the package at runtime to get the scala sources

import pyspark

spark = pyspark.sql.SparkSession.builder.appName( "MyApp" )\

.config( "spark.jars.packages" , "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1" )\

.getOrCreate()

import mmlspark

