Maven pom.xml

After unpacking that package, we want to make sure we have a certain amount of dependencies. Let’s examine them one by one. First, let’s have a look at what we need in regard to start a Spring Boot process:

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-autoconfigure</artifactId>

</dependency>

This will enable the Spring Boot context. Now we already know that we need the Hosebird Client itself:

<dependency>

<groupId>com.twitter</groupId>

<artifactId>hbc-core</artifactId>

</dependency>

These are the most important dependencies we need.

Let’s now add some style to our code. The builder pattern provided by Lombok can really help. Let’s add it:

<dependency>

<groupId>org.projectlombok</groupId>

<artifactId>lombok</artifactId>

</dependency>

A great way to generate files fast and efficiently is to make use of the logging mechanism. For this Logback can be a very handy tool:

<dependency>

<groupId>ch.qos.logback</groupId>

<artifactId>logback-classic</artifactId>

</dependency>

As clearly shown by the Twitter API, all messages are exchanged via JSON. For this let’s use GSON:

<dependency>

<groupId>com.google.code.gson</groupId>

<artifactId>gson</artifactId>

</dependency>

Because we want to make unit tests that belong to current times let’s add the Jupiter unit test dependency:

<dependency>

<groupId>org.junit.jupiter</groupId>

<artifactId>junit-jupiter</artifactId>

<scope>test</scope>

</dependency>

Jupiter needs to run on an engine. We add it like this:

<dependency>

<groupId>org.junit.jupiter</groupId>

<artifactId>junit-jupiter-engine</artifactId>

<scope>test</scope>

</dependency>

The same goes for our assertions:

<dependency>

<groupId>org.assertj</groupId>

<artifactId>assertj-core</artifactId>

<scope>test</scope>

</dependency>

Commons.io provides a very good library which is very useful to read and use resources in the unit tests:

<dependency>

<groupId>commons-io</groupId>

<artifactId>commons-io</artifactId>

<scope>test</scope>

</dependency>

If we are making unit tests, then it is also important to use mocks. For that we can use Mockito:

<dependency>

<groupId>org.mockito</groupId>

<artifactId>mockito-core</artifactId>

<scope>test</scope>

</dependency>

We also need to make tests were a running instance of Spring boot is running. This is why we also need its test dependencies:

<dependency>

<groupId>org.springframework</groupId>

<artifactId>spring-test</artifactId>

<scope>test</scope>

</dependency>

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-test</artifactId>

<scope>test</scope>

</dependency>

These were the most important sections of our pom file. This is the end result. It is important to notice that the Java version being used is “14”:

pom.xml

We are now ready to examine our code.

Twitter Fetcher — Spring Process Implementation

First, let’s implement a model to extract data from the Twitter API. First we implement the User model:

User model for Twitter

Then we implement the Message model:

Message model for Twitter

Now we need to configure the OAuth1 configuration to access Twitter:

Twitter Configuration

The first thing we need to be concerned about when accessing the Twitter API, is our OAuth access. In our code we will be using OAuth1 and for that, we create a bean of OAuth1 type:

@Bean

public OAuth1 authentication() {

return new OAuth1(consumerKey, consumerSecret, token, tokenSecret);

}

This bean will be available in our Spring context and we will it when creating our TwitterClient.

At the bottom of this configuration class, we can see that we are creating a BlockingQueue. This queue allows us to immediately add an element to it. The elements we will add to it will be out twitter messages. Note that we won’t do this explicitly in the code. The HBC library will do this for us. Also note that we are taking advantage of prototyping:

@Bean

@Scope(value = "prototype")

public BlockingQueue<String> blockingQueue() {

return new LinkedBlockingQueue<>(capacity);

}

This is important to do because we want to create a new queue on injection time. This allows us to seamlessly create a new queue every time we start a new Twitter run. It is also important to notice that we are injecting the capacity property. This way we are sure to guarantee that we will only have a maximum amount of messages in the queue at any given time. We are controlling that also in the code, but this way, even if something goes wrong, the queue won’t support more than our limited number of messages. It safeguards against a flooding of messages, which we know we don’t need. The capacity in our example is also being used as a measure of limit. It doesn’t have to be this way. For example, if we say we only want a maximum of 200 messages in the queue, we can still regulate that we will only get 100. However, if we do that, the big disadvantage is that those extra 100 messages will be loaded in the queue regardless. We won’t read them and discard them in the end, but they will still be taken into the queue. This is why for our example ONLY, we share this capacity figure. In a real-life production situation we would probably have to split this capacity into two other properties. For example we could name them capacity and limit. However, this is not important for this example.

In order to get our configuration for our Spring Boot process complete, we only need to add the default configuration file:

application.properties

We will override the Twitter application properties. The Cron job will win at every minute via expression “0 * * * * *”. It is specified to reach a 100 messages maximum per run. And the final constraint is that each run lasts a maximum of 30 seconds.

According to our requirements, we want to stop the process precisely after 30 seconds. Let’s then make a thread which can terminate a whole ExecutorService:

Killer Thread

We let the duration of the Thread to be configurable. Let’s now make another thread which is responsible to fetch the data from Twitter:

Fetcher Thread

This thread is responsible for fetching the data from twitter. It will check for a maximum of 1 second for a message to be placed on the BlockingQueue. This will loop until a total of 100 messages are received. The process may also be terminated as seen before by the KillerThread.

Twitter Client Interface

Now we create the implementation:

Twitter Client Implementation

Our Twitter client is where all of our action occurs. We create an ExecutorService that holds a maximum of 2 threads at the same time. Then we initialize an instance of a HashSet. This is to make sure that we do not get a double read of the latest message, which can happen. We then create both KillerThread and FetcherThread threads. Then we create timestamp instances at the start of our process and another after submitting the threads. Calling the shutdown will signal the Executor to shutdown. We then wait a maximum of exactly 30 seconds. before we continue.

Twitter has had different interfaces in the past. Currently we mostly use https://stream.twitter.com/ API. However, we can still choose between this one and the old ones:

For our case we choose STREAM_HOST:

final Hosts hosebirdHosts = new HttpHosts(Constants.STREAM_HOST);

As seen at the creation of the client we also need to implement a TwitterMessageProcessor. This is a processor we will use to process our messages:

Twitter Message Processor

And now let’s see how it’s implemented:

Twitter Message Processor Implementation

We are using GSon to help us with the processing of the dates that are coming from Twitter. The dates should look something like this:

Fri Feb 15 21:13:00 +0000 2019

This is why we use the following format with GSON:

EEE MMM dd HH:mm:ss ZZZZZ yyyy

In this case, each section of the date means:

EEE — The day of the week abbreviation (3 letters)

MMM — The month abbreviation (3 letters)

dd — The day of the month

HH — Hour of the day (24 hour period)

mm — The minute of the hour

ss — The second of the minute

ZZZZZ — ISO8601 extended timezone

yyyy — The gregorian calendar year

At the same time, we are using the very nice lambda expressions provided by Java to group our data, order it and form our data structure to return to our front end app. It is in this process that we log the processed message to the logger on line:

log.info(gson.toJson(pageDto));

Let’s finally implement the runner:

Spring boot runner

We are essentially creating a CommandLineRunner, which starts with an instance of a runningService. This service has two possible configurations. In our implementation we will provide a single run configuration and a scheduled run. On that note, let’s have a look the interface for this:

Running Service Interface

Let’s first have a look at the single run implementation. We call it OneRunServiceImpl:

Single Run Service Implementation

Looking at the “@Profile” annotation, we can see that it is initiated with value “!scheduler”. This means that if we don’t start this service with a profile scheduler, it will just do a single Twitter fetch operation.

It is now time to look at the scheduler implementation. We call this SchedulerServiceImpl:

Scheduler Service Implementation

In this case, we can see that we need to start with the profile “scheduler” in order to activate it. Another point we need to look at is the “@Scheduler” annotation. This will tell the Spring Context, that we want to execute this method during the interval described in the cron value. When the process runs, we will the Twitter data. At this point, we didn’t even have had a look at our HBC library.

Finally in order to select which logs are important to output, first let’s make the Java code that ensures that in Logback:

Twitter filter for Logback

And this would be our logback.xml:

logback.xml

Notice that we only have one FileAppender. This appender will write our message as is to the file output. We will be able to see all the other logs including this one in the console. That is the function of the ConsoleAppender.

Twitter Logger Service Implementation

Our Twitter logger service basically reads the log file and places it between square brackets. This turns the pilled up JSON generated Twitter messages into a readable JSON array format. It is implemented using the Flask library:

Log service in python

Here is an example of a JSON file we may receive:

JSON file example

It is of course obfuscated, but this is just so that we have an example about it.

Twitter Front End Implementation

The implementation of the Front End is not the focus of this article. However, we need to understand its importance. This is how we are going to visualize our results.

This is how it looks like:

FE Look

Here we can click on the run. The run tells us how long the run took. 30 seconds normally, or less if 100 messages are reached. Wherewith how many messages per second were retrieved per second and finally, the total number of messages retrieved. Once we click on that, we can see a list of all the authors that have tweeted in that period. Finally, when clicking on an author, we finally see all the messages they have written.

Setting it all up

The first thing to do is to create our entry point. We need to make sure that all three elements of this architecture are started, assuming that the container has already provided that for us. To do this we need to create it. In this case, let’s first run the generate.sh wizard:

Generate wizard bash script

Running the wizard, it will ask for your specific keys and tokens. This has to be provided by you. Once we have completed our questionnaire, the wizard will create an docker-entrypoint.sh entrypoint file for our docker image. It will look something like this:

docker-entrypoint.sh

Our image is then generated via our Docker file:

Docker file

Note that we are using my own je-all-build-jdk-14:0.0.1 image. In a nutshell, this Docker file instructs docker to build an image by copying all necessary files to /usr/local/bin. These are the jar runnable Spring Boot file, the front end distribution and the python logger service script. We also further add the necessary files for NGINX.

We are finally now going to fire this up this system in one go with the build.sh file:

build.sh file

We first build the whole Maven project and therefore we generate our executable jar file. Then we run the npm run build. This will create our distribution package. Finally we create our image and start the container.

Conclusion

As we have seen, the Hosebird Client is an excellent tool to reach Twitter and retrieve data.

We have seen how can we process this data to our advantage. We have also looked into how we can make a complex process with particular restrictions run in parallel. We have seen how we can implement a race condition to reach our requirement goals and with it create separate runs.

I have placed all the source code of this application in BitBucket

I hope that you have enjoyed this article as much as I enjoyed writing it.

I’d love to hear your thoughts on it, so please leave your comments below.

Thanks in advance for your help and thank you for reading!

References