Discovering how are most famous DeFi projects doing data science in their repositories Iván Alberquilla Follow Feb 27 · 5 min read

Photo by Markus Spiske on Unsplash

Introduction

The main objective of this post is to analyze the repositories of most popular DeFi projects through data science, in order to help people that want to contribute to those who want to collaborate or create their own DeFi project, based on data analysis.

This post only shows the analysis of the data obtained, who wants to delve deeper into the code in python to analyze it, I leave the link to my github.

The projects analyzed are those shown on the DeFi project map.

Source: https://www.theblockcrypto.com/genesis/15376/mapping-out-ethereums-defi

Dataset Description

To evaluate the projects, data have been obtained through the Github API, compiling all the repositories that make up each project, excluding those that are forks of other projects. In total, 1588 GitHub code repositories have been analyzed.

Project Analysis

Sometimes it seems that creating a project like the ones in the image is simple, and it takes a little work, we will see some data that make you have a more realistic image of what it means, to have a project like this.

Looking at the repositories that make up the projects, we see that on average, there are 14 repositories per project with an average of 21.5 contributors per project.

If we look in more detail, we see which are the projects with more repositories, seeing that some of the most famous projects have more than one hundred repositories.

In these repositories, people are involved in their development, if we look at how many unique contributors there are per project, although they can collaborate in more than one repository of the same, and we add them we see that in some cases they are quite large projects

In addition, we see that to get to the point where they are, this work is not a flash in the pan, if we look at when the first repository of these projects was created, it may suggest that, like all large projects, they have a long maturation time.

The following table shows the name of the project and the creation date of its first repository, ordered from least to greatest.

Languages

It is often thought that to create a blockchain project, it is necessary to develop a large amount of code on the blockchain and that most of the effort will be focused on writing many contracts written in solidity.

If we look at the predominant languages in the repositories of these projects we see:

Repositories with the predominance of Javascript are the winners. If we also see the accumulated size of repositories grouped by language:

The weight of Javascript becomes more evident. This according to my personal experience is like this, since most of the logic that is written in the blockchain, is usually the core part, and only the essential is written/read from the blockchain, to save costs on transactions, deployments… and time in the queries / writings.

Around this core, tools like APIs, SDKs, user interfaces are created, where Javascript is predominant.

To analyze this point in more detail, we will obtain all the languages present in a repository, since GitHub gives a percentage to each repository, and assigns the predominant language the one that has the most.

For example, in this case, we would say that the language is Python, although there are others that have enough weight:

If we get the weights of each in each repository, and add them, we see that something similar to the above comes out:

The popularity of the projects

To analyze how popular these projects are for developers, we are going to see how many stars the repositories have, how many have been made a fork and how many people are subscribed to them to notify them of changes in them.

Open issues

In open-source projects, you can collaborate by reporting issues on the code or helping in its solution. Once this problem is resolved, it becomes “merge” within the code and closing this error. We can check which projects have more open issues:

Collaborate

A good way to collaborate with these projects is to help in the resolution of issues. To begin with, it is good to look for those that are categorized as “good for the first issue”, which allows collaborating and learning from the project in charge of doing tasks with less complexity or where deep knowledge of the project is not necessary. So you do not have to go looking for one by one. In this table. I put a list of issues with this category and the project to which they belong. These issues were extracted at the moment of writing this post, depends on when you are reading it, could vary.

Is there any data that surprised you? Do you miss having analyzed other values? If you think something is missing or you would be interested to know something else, tell me in the comments.