transformations

actions

lazily evaluated

lambda expressions

parameter destructuring

Extension methods

I read's blog post about Kotlin and Apache Spark around the end of last year. At the time, I had a basic knowledge of Kotlin and I just knew what Apache Spark was, but no idea on how to use it. That post convinced me to start digging a bit more into the Big Data world and, more importantly, to start writing about Big Data using Kotlin. So, if you're reading, thank you, Thomas!!It has been a month now since I started learning Apache Spark, and it is easy to understand why it is the natural successor of Hadoop . First of all, Spark replaces the Hadoop MapReduce paradigm withandon an RDD, (). As the name suggests, an RDD represents the dataset we want to work with, loaded in memory and distributed on multiple nodes of a cluster. Moreover, on an RDD, it is possible to perform multiple, in order to modify and transform its content, with the use offor collecting the results of those transformations. While most of the transformations and actions in Spark can be implemented using the MapReduce paradigm in Hadoop, Spark improves two well-known limitations of the latter: the coding experience and the general performances. The former is solved thanks to the presence of multiple transformations and actions that they no longer require their manual coding in the form of a Mapper or a Reducer. The latter are, instead, improved with the use of main memory for all the operations performed, avoiding the writing and reading from disk needed during multiple MapReduce jobs. Moreover, the loading in memory of an RDD and the transformations performed on it, are. This means that, until an action is called on the RDD, nothing is performed from a computational point of view. In this way, it is also possible, behind the scenes, to optimize those transformations.In its blog post, Thomas shows how it is possible to create an Apache Spark project and use its Java API with Kotlin. In this blog post, I want to extend the discussion, showing why, in my opinion, Kotlin should be preferred over Java, with some examples on how to improve the Java API and the general user experience with Spark. You can find all the code related to this post, where there's also a brief explanation of the exercise solved.First of all, the possibility to useinstead of anonymous classes significantly improve code readability. Even if lambdas are present in Java 8 too, I believe that the possibility in Kotlin to useinstead of declaring the lambda parameter, but also(line 28) introduced in Kotlin 1.1, makes the general experience of using lambdas a bit more enjoyable.are another point in favor of Kotlin. Line 25 and 54 show two ways of improving the Java API, making them more. As shown in the code example below, the first extension is used to make thecreation similar to thecreation, using thenotation. I decided to useinstead ofonly for avoiding confusion between the two methods, but the name must be absolutely improved!The second extension is, instead, a typical Kotlin extension method, creating a Spark-like method that, instead of returning a Java mutable map, embraces the immutability collections present in the Kotlin standard library.And talking about extension methods and standard library, line 15 and 19 show two methods already implemented in Kotlin. The former one is, allowing to reference theinstance with thekeyword where necessary, while the latter is, the Kotlin version of the Java try-with-resources, used with theinstance.Apache Spark is written in Scala, and it is obvious that the latter will probably be the best way to write Spark code. However, I think that at this point, Kotlin can be considered a better choice than Java in using Spark, considering also how the community is going to grow in the next couple of years, thanks to Kotlin Native and the Android support This is just a brief overview of how it is possible to improve the Java experience of Apache Spark using Kotlin and I'm planning to write more about this subject. If you have any question, suggestion or criticism, don't hesitate to comment here or contact me.