CoinCreate aims to safely type a pair of crypto-currency code and its current price. Thus, it has two parameters code (referring to the currency code) and price (current price in USD). However, while thinking about its companion object we need to consider the shape of the price records in each row. For instance, if we consider only to use coin names and prices in our case class, in an array of records their indices will be 1 and 3 . This is quite similar to column indices for tables.

By observing the price table (above you can find a screenshot from the homepage), we decide to use a companion object to have an apply method for functionally transforming an input of String List to CoinCreate . Although this transformation is not that straightforward, we can use helper functions to get only the coin code ( getCoinCode ) and transform the dollar price string into a double ( numberStringToDouble ).

CoinCreate case class and its companion objects

CoinInsert aims to safely type a pair of crypto-currency code, its current price, and a log timestamp for insertion time logging. We can use this case class while inserting a vector of CoinCreate into PostgreSQL. As its parameters are so similar to CoinCreate , we can create a simple companion object to transform a CoinCreate to CoinInsert . This object’s apply method can naturally add the current timestamp to a CoinCreate to obtain a CoinInsert .

Hence the only difference between a CoinCreate case class and CoinInsert case class will be the current Timestamp , notated as a logTimestamp parameter.

CoinInsert case class and its companion object

Scraping Functions

Scraping with scala-scraper and JSoup is quite easy. First, we need to GET request to the homepage by creating a new JSoup browser. A new JSoup browser enables to fetch HTML from the web. Since we need only HTML parsing JSoup was enough in this case, for Javascript using pages other browser options could be used.

GET request function

By using the GET request, we need to find the main table and store it as a Vector of Strings. Luckily when we specify that we are looking for a table element, scala-scraper’s table method does all the job for us.

Table scraping function

Lastly, we need to slice the Vector, starting from the second index till the last lines, as the first line contains column names (headers). The resulting sliced Vector would still have the table rows with their HTML elements as String. So we can benefit from functional programming to map all the table rows (Vector elements) while extracting text in the elements then transform to CoinCreate (the comfort of having a tailor-made apply function).

Insertion Functions

doobie is an amazing functional JDBC layer for Scala. It provides a functional way to write any JDBC program. In this tutorial, we will keep it quite simple by writing only a connection Transactor to connect to the local PostgreSQL database and an insertion function to make the transactions with type-safety.

To connect to the database, we need to use a Transactor stating the type of the driver (in our case it is a PostgreSQL driver), URL for connection, user name, password, and an Execution Context (EC). The transactor needs an implicit val to define the EC. For non-blocking operations doobie’s Transactor uses contextShift . For testing our code doobie documentation recommends using synchronous EC.

Transactor for establishing JDBC connector

For writing a row by row insertion function we can use SQL interpolation. The function has an input of CoinInsert and an output of Update0 (representing a parameterized statement where the arguments are known).

SQL interpolation with the doobie

Lastly, we can GET request the homepage by using the getCoinUpdatedTable function. This will return a Vector of Strings.

Consequently, we can use this Vector ( coinTable ) to transform CoinCreate to CoinInsert case class and execute the insert statement we prepared in the previous step.

A few lines of code that do all the scraping and inserting job

Last Words

Thanks to doobie, with only a few lines we were able to scrape the crypto-currency prices from CoinMarketCap and insert into a local PostgreSQL database. Although the code does its job, for now, the source code can be extended with exception handling and monitoring. The whole main class can be found below. And you can find the whole project in this Github repository.

With this short article, we aimed to introduce some Scala concepts to web scraping. To learn more on data engineering one can choose to attend a Bootcamp with hands-on training, do more projects, read and share more.