Have you ever wondered what are the most popular open-source repositories on Github? What might they tell us about the upcoming trends in software development or about Github itself?

At appbase.io, we have. We wanted to know how many repositories have more than 500 stars, how many of them have more than 1000 stars, how active are these repositories, and what they can tell us about the upcoming trends in open-source software development.

We analyzed all Github repositories with more than 500 stars 🌟 , a total of 🔝 23,403 repositories.

The Setup

Here, we describe the set up we have used for creating the interactive analysis. You can also time jump directly to the analysis.

1. Fetching the data

In order to get the latest repo data, we have made use of Github’s search API. Using the /search/repositories endpoint, one can get all the repository metadata including stars , forks , repo name , owner , url , description and much more.

One caveat of the /search/repositories endpoint is that the repo results are limited to a maximum of 1,000 per request. In order to work around this limit we decided to get the repository data in chunks, starting from 500 stars and incrementing the stars range in small ranges. You can find the entire repo which includes the scripts for fetching, filtering and importing the data here.

A typical repository record looks like this:



"name": "freeCodeCamp",

"owner": "freeCodeCamp",

"fullname": "freeCodeCamp~freeCodeCamp",

"description": "The

"avatar": "

"url": "

"pushed": "2017-07-30T19:27:08Z",

"created": "2014-12-24T17:49:19Z",

"size": 30916,

"stars": 290987,

"forks": 12465,

"topics": ["careers", "education", "javascript"],

"language": "JavaScript",

"watchers": 8555

} "name": "freeCodeCamp","owner": "freeCodeCamp","fullname": "freeCodeCamp~freeCodeCamp","description": "The https://freeCodeCamp.com open source codebase and curriculum. Learn to code and help nonprofits.","avatar": " https://avatars3.githubusercontent.com/u/9892522?v=3 ","url": " https://github.com/freeCodeCamp/freeCodeCamp ","pushed": "2017-07-30T19:27:08Z","created": "2014-12-24T17:49:19Z","size": 30916,"stars": 290987,"forks": 12465,"topics": ["careers", "education", "javascript"],"language": "JavaScript","watchers": 8555

2. Importing the data

Next, we needed to import this data into an appbase.io app, which is a hosted Elasticsearch index. We accomplished this using dejavu’s new GUI based data importer feature. Dejavu is also the best data browser for Elasticsearch.

3. Continuously updating the data

Github repos and their stargazers, forks, etc. attributes are frequently changing, so we also need a way to keep this analysis up-to-date. The gitxplore-repo-scripts repo contains a script which is used update the data in the appbase.io app, we run this script using a weekly cron job.

4. Setting up the charts

To visualize all of this data, we have made use of D3 to build interactive charts. A good benefit of using D3 v/s a static graphing system is that these charts (and the analysis) will stay current as the underlying Github data changes.

As an added bonus, we have also built a Github explorer app to search through all of this data. You can read all about it here: