GitHut is an attempt to visualize and explore the complexity of the universe of programming languages used across the repositories hosted on GitHub. Programming languages are not simply the tool developers use to create programs or express algorithms but also instruments to code and decode creativity. By observing the history of languages we can enjoy the quest of human kind for a better way to solve problems, to facilitate collaboration between people and to reuse the effort of others. GitHub is the largest code host in the world, with 3.4 million users. It's the place where the open-source development community offers access to most of its projects. By analyzing how languages are used in GitHub it is possible to understand the popularity of programming languages among developers and also to discover the unique characteristics of each language.

Data

GitHub provides publicly available API to interact with its huge dataset of events and interaction with the hosted repositories.

GitHub Archive takes this data a step further by aggregating and storing it for public consumption. GitHub Archive dataset is also available via Google BigQuery.

The quantitative data used in GitHut is collected from GitHub Archive. The data is updated on a quarterly basis.



An additional note about the data is about the large amount of records in which the programming language is not specified. This particular characteristic is extremely evident for the Create Events (of repository), therefore it is not possible to visualize the trending language in terms of newly created repositories. For this reason the Activity value (in terms of number of changes pushed) has been considered the best metric for the popularity of programming languages.



The release year of the programming language is based on the table Timeline of programming languages from Wikipedia.



For more information on the methodology of the data collection check-out the publicly available GitHub repository of GitHut.