Building an Open Source Community

Finding users for the package, welcoming them, getting them excited about contributing to your project, providing support and structure in a virtual team, hosting meetup events to evangelize the package, and creating community guidelines are all a part of running a successful open source project. The following section outlines some of steps we have taken in setting up our open source community.

Creating an Organisation

Building on the ideas of Lopez de Prado (Lopez de Prado 2018b) and pulling inspiration from AQR (Applied Quantitative Research) we decided to setup a brand called Hudson and Thames Quantitative Research, based on the rivers where the authors reside. It would be the platform for an open source finance research laboratory where anyone could contribute to the development of tools.

Setting up such an organization would allow us to leverage the project in various ways. For example, we could now launch a crowdfunding campaign to fund the development of mlfinlab or pivot to a consultancy/asset management business.

By doing this we can build a product, a brand, and a client base – before the product has reached its final form.

Github Open Source Guidelines

Github acts as a platform to develop software and is well known as a repository for open source projects such as Numpy, Pandas, Scikit Learn, and Tensorflow.

They also provide a number of guidelines (Github 2019) for running an open source project. In particular, they recommend the following documents which have been included in the mlfinlab repository.

ReadMe: Introduces and explains a project.

Code of conduct: A welcoming and inclusive document that outlines the community standards and outlines procedures for abuse.

Contributing Guidelines: Outlines how members of the community can participate in the project and the types of desired contributions.

License: A BSD 3-Clause License which is a permissive license similar to the BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the project or its contributors to promote derived products without written consent.

Issue & pull request templates: Templates are provided to help contributors include information in a commit that would be relevant. For example the bug they fixed, which operating system and IDE they used.

Online Community Channels

A project of this nature is very niche and thus we expect the community to be very small. In essence, we are targeting users that use python, are familiar with machine learning, and care about finance. It is even smaller when we subset it to those users that are actively reading academic literature and exploring modern techniques.

The following are the online sources that we made use of to reach users:

Reddit

We had by far the most success with Reddit. Due to the subreddit structure, we are able to reach groups of people that subscribe to specific subreddits. Overall we found that technology-focused groups were very vocal and pro the package where the more fundamental/discretionary investor communities disliked it.

Our conclusion is that it would be much harder to sell the idea of machine learning to companies that weren’t already aligned with systematic investing. Thus when considering the idea of consulting or selling a machine learning product, avoid firms that are focused on fundamental style strategies, and focus on companies that were already exploring the idea.

We received a high number of votes in both the algorithmic trading and machine learning communities, as shown in the figure below.

Twitter

Twitter has been great for instant responses from noteworthy persons and building somewhat of a brand name and reputation. In particular, Lopez de Prado has retweeted our research as well as liked several of our tweets regarding the package.

Blog: Quantsportal.com

Jacques’ personal blog has been around since 2015 and has built a small following in the quantitative finance community. It has acted as a portfolio of his work since his undergraduate days and we made use of its distribution channels to get the message out regarding package developments. In particular, the blog is linked to the well-known blog aggregator Quantocracy.com.

Currently, his website acts as a central location regarding news for mlfinlab and has also created a side project termed the Open Source Hedge Fund Project, which ties back to creating an open source financial research lab.

LinkedIn and Facebook

Overall the feedback from Linkedin and Facebook was disappointing. The message reaches our personal network with a few conversations starting but we didn’t feel that the message carried beyond our network.

Offline Community Channels

Typically offline events refer to meetups and guest speaking opportunities in which you promote the package.

Guest Speaking

We have made use of the Meetup.com website and created a Machine Learning in Finance London group which at the time of writing has 250 members. Our first meetup is scheduled for the 23 May 2019 at Monticello House. We will also be guest speaking at the London Python for Trading Meetup on the 22nd of May 2019.

Sponsorship

Thankfully we have secured sponsorship from GridGain Systems, a high-performance computing company who also hosts the In-Memory Computing Summit, which includes tickets for our members to events, fees covered for venue hire, and possible speakers on the topic of machine learning to host at our meetup.

Tutorials

We are in the process of creating a few tutorial notebooks which we will use at our meetup events in London. They can be found along with the other example notebooks on the Research Repository.

At the time of writing we have the following example notebooks:

There is another developer which runs the BlackArbsCEO repository on GitHub which has notebooks covering multiple chapters. We recommend readers also view his work.

Final Remarks

Mlfinlab as a package will be in a constant state of development. Our vision is to implement all of the principals mentioned in the textbook and then move onto adding other recent developments in financial machine learning, as they emerge.

The success of the project will be based on user adoption of these techniques and if we can generate a source of revenue to justify the many hours spent developing it.

At the time of writing, we are a team 5 individuals all in different locations, implementing the various chapters. For some of us, this project is a platform to help us get placement at top employers, for others it a tool to help build a reputation in the industry.