In the remainder of this post, I’ll outline each of the components of the pipelines and their configurations.

GBFS

GBFS is data feed specification for bike share programs. It provides several JSON schemas so that the same data from every bike share can be open to app developers to incorporate into their systems or for researchers. The Ford GoBike system uses GBFS so I refer to data from Ford as GBFS for simplicity for the remainder of this article.



From GBFS, there are several feeds available and the details about each can be found here. For our use case, we're interested in each station’s status and also each station’s locations for future visualization. For a sample of the feeds, you can check out the documentation.

For this project, we’ll poll the GBFS for the station status. Station locations change infrequently, so a daily or weekly cron pull of the station locations should be sufficient to ensure our tables have the most correct information.

StreamSets

The majority of the heavy lifting for this system is managed using StreamSets. If you're unfamiliar with StreamSets, their website and documentation is top notch. At a high level, StreamSets is a plug-and-play stream-processing framework. I like to think of it as Spark Streaming with a UI on top of it. It provides a drag-and-drop interface for the source-processor-sink streaming model.

HTTP Client

To poll GBFS, I created an HTTP client in StreamSets with the following configurations: