Spark-CoreNLP wraps Stanford CoreNLP annotation pipeline as a Transformer under the ML pipeline API. It reads a string column representing documents, and applies CoreNLP annotators to each document. The output column contains annotations from CoreNLP.

Include this package in your Spark Applications using:

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "databricks/spark-corenlp:0.4.0-spark2.4-scala2.11"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "databricks" % "spark-corenlp" % "0.4.0-spark2.4-scala2.11"

Maven

<dependencies> <!-- list of dependencies --> <dependency> <groupId>databricks</groupId> <artifactId>spark-corenlp</artifactId> <version>0.4.0-spark2.4-scala2.11</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>

In your pom.xml, add: