Following up on our introductory post in the topic I’m now “releasing” the full B1 Research Proposal document I submitted to the ERC Consolidator Grant 2016 focused on the study of open source communities.

If you want full details of the proposal (either because you like the topic or you are just interested in collecting some examples of ERC proposals to help preare your own) keep reading. If you just want the short version of our research ideas on this topic, the following presentation (or this short roadmap paper) can be good enough:

(UPDATE: this research proposal was rejected, still, we continue to believe this is a research line worth to be investigated so we’re going ahead with some of the sublines while looking for funding to go full steam ahead with it)

My goals with this public posting of the proposal are:

Help other researchers going through a similar “ERC experience” . Obviously, this is just a proposal so I’m not saying this is a good example of a ERC proposal, it could be a terrible one but, still, it’s an example and, unfortunately, many people talk about open science but few practice it so I’m sure some of you will find it useful when writing yours. Also, this research proposal is aimed at studying open source development so I’d find counterintuitive not to have the proposal itself in the open

. Obviously, this is just a proposal so I’m not saying this is a good example of a ERC proposal, it could be a terrible one but, still, it’s an example and, unfortunately, many people talk about open science but few practice it so I’m sure some of you will find it useful when writing yours. Also, this research proposal is aimed at studying open source development so I’d find counterintuitive not to have the proposal itself in the open Hope to find other researchers interested in this research line to collaborate with

to collaborate with Find practitioners/contributors to OSS projects that would be open to help with our reserach by accepting to be contacted/interviewed to learn more about how OSS is developed, commit to reading the results we produce and (maybe) try them in their projects. If you’d like to help please fill this form

Now, without further due, my ERC proposal (B1 file):

Very Large COmmunity-based Software DEvelopment (CODE)

Name of the Principal Investigator (PI): Jordi Cabot

Name of the PI’s host institution for the project: ICREA – Universitat Oberta de Catalunya

Proposal duration in months: 48

Abstract:

We live in a software-enabled world. Software is everywhere, in your laptop, your phone, you car and even (sooner than later) your toaster. Global cost of software development is estimated to be over one trillion dollars making it a crucial market for Europe’s ICT initiatives.

Much of this software is critical for the daily activities of our society and has a large community behind it, comprising thousands of contributors but also millions of users that must be listened to as well. This should be especially true for software built following the principles of Open Source Software (OSS) typically developed in a collaborative manner via online code hosting platforms like GitHub.

In theory, OSS is of better quality thanks to this higher community involvement (at different levels: submitting bug reports, feature requests, giving feedback, contributing code…). Luckily, most of the crucial software for our society is OSS (like Apache Server, Firefox, Linux or WordPress). In practice, though, many OSS projects suffer from a lack of transparency and democracy, fail to attract and manage contributors and, in general, are unable to properly respond to their users’ needs. This hampers their future success and will impact the growth of Europe’s ICT.

The goal of this project is to transform software development into a real community-driven process by providing an online collaborative platform where a software community at large (i.e. including its users) can effectively participate and be managed in order to make joint decisions in the open to ensure the long-term sustainability of the project. This will require solving a number of research challenges around the human and social aspects of software development. Therefore, the project will build a unified interdisciplinary framework combining techniques from software mining and analytics with methods borrowed from political science, sociology, and economics.

Section a: Extended Synopsis of the scientific proposal (max. 5 pages)

Problem description

We live in a software-enabled world and open source software is a key player in it: “Software is everywhere today, yet its instrumental role in the modern digital economy is often overlooked. With market revenues of over €200 billion in Europe software is the largest and the fastest growing segment of the ICT market … Open source software (OSS) is now playing a significant role in this Software economy. A number of OSS specific actions could contribute to growth in Europe, jobs creation and improvement of the European Software imbalance ” – European Software Strategy Report[1].

These numbers and vision clearly convey the importance of software development and, in particular, OSS development in the European economy (and, in fact, our daily life, each of us interacts with OSS every single day even if inadvertently). According to the Open Source initiative: “OSS development is a development method that harnesses the power of distributed peer review and transparency….The promise of OSS is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in“. This level of quality is due to the active participation of the community[1]. This is also the key proposal of the well-known essay “The Cathedral and the Bazaar” [2] where the author contrasts two development models: the Cathedral model where code is developed by a restricted set of developers and the Bazaar model where development is a collaborative endeavor and users are co-developers constituting altogether a very large global community of people with different profiles. Indeed, this “co-developer” role doesn’t mean users contribute code, it highlights the fact that users are key members of the software community[2], have a say in it and can contribute in any form or shape they can, e.g. submitting bug reports, feature requests or just giving feedback on any aspect of the software. This is different from end-user development approaches [3] that pretended to convert users in semi-developers to adapt themselves the software alone.

Unfortunately, this does not reflect the reality of OSS development and therefore the potential benefits of OSS to the European society may never happen. Reality shows that many OSS projects are closer to the Cathedral model than the Bazaar one. I manually analyzed the twenty-five most popular projects in GitHub[3] and found out[4] that only one (4%) explicitly described how user contributions would be managed (with another 28% giving partial hints). This means that 68% had no explicit governance model[5]. Absolutely none of them were democratic (i.e. end users could not vote in any way not even to elect people to represent them). In fact, the only one describing its decision-making process stated that “this project follows the timeless, highly efficient and totally unfair system known as Benevolent dictator for life”. Clearly, not what is common practice in the rest of community aspects of our society[6]. And this is not the only problem. Most projects struggle to attract contributors and to properly manage their massive communities of developers and users. In fact, we can conclude the OSS model is broken with many projects failing and getting abandoned in the very early stages (see [4] for some statistics). Therefore, alternative software production models deserve to be explored now.

I argue in this proposal that to improve software quality (in the broadest sense of the word, i.e. including product-market fit) we need to shift the focus of our software engineering research from a code-centric focus to a people-centric one. This shift will be achieved by implementing an ambitious multi-dimensional and cross-disciplinary research agenda that will bring to the software field expertise available in other academic disciplines. This is obviously a challenging task since it will involve transforming the way software is developed, making the process more open (now for real!) and community-driven. Still, software has largely contributed to make our world more social (e.g. enabling the social networks or the sharing economy services) and democratic (e.g. e-democracy and voting systems). I believe it is time we explore how these aspects can benefit software development itself.

State of the art

The software research community has been chasing forever the silver bullet that will fix all problems in software engineering [5]. Recently, the availability of a massive dataset of software project data in repositories like GitHub (with over 30 million projects, even if data needs to be taken with a grain of salt [6]–[8] [9]) has opened new research opportunities focusing on mining such repositories for valuable insights on good software development practices, specially wrt open source projects. We have performed a systematic literature review of these papers resulting in the selection of over 100 papers that have been analyzed and classified to detect the open research challenges in the software domain. Herein, we present a summary of this work, validating the need for this research proposal.

Published papers analyze software projects from different angles but mostly with a code-centric view, meaning that they focus its analysis on the projects’ source code by analyzing, for instance, (1) the use of programming languages (e.g., [10], [11]), (2) the type of license they apply (e.g., [12], [13]), (3) the folder structure of the project [14] or the potential vulnerabilities and complexity of the code (e.g., [15], [16]). Others focus on more methodological aspects covering testing practices (e.g., [17], [18]), refactoring (e.g., [19]) or pull requests (e.g., [20], [21]). This is also true for several European funded projects on OSS-related areas like MANCOOSI , OSSMETER or MARKOS.

Only a few works analyze the social part of the software development process, trying to understand how developers are internally organized and work together in the project. There are studies on the team diversity (e.g., [22], [23]) and composition (e.g., [24], [25][26][27]). Community dynamics are analyzed looking at the interactions between community members and the project or among members themselves. The former category includes works that analyze the first impression formation (e.g., [28]), using projects for hiring new people (e.g., [29]), onboarding (e.g., [30]) and social coding (e.g., use of the social services of GitHub to track activity in projects of interest [31]). The latter includes works studying the social and technical factors that motivate people to contribute to a given project (e.g., [32]), algorithms that recommend developers to open tasks (e.g., [33]) and their role in promoting together the project itself (e.g., [34]).

Based on the gaps detected in this literature review, evidence from existing projects and discussions with members of the OSS community, we can conclude that (open-source) software development faces the following open challenges:

It is not as open as you would expect (code is open, the management and decision-making of the project is not even if we do not know why)

It has strong difficulties in attracting contributors with most projects having only one or two contributors.

It is unable to manage its community efficiently

which hampers people’s experience with open source thus threatening the project’s evolution and success. A recent example would be the fork[7] of node.js (an extremely popular JavaScript runtime environment) due to differences in the governance of the project. Once the dispute was settled the forked version (io.js) was merged back to the main project but, in the process, countless hours were wasted in the parallel development of the two versions plus all the confusion this situation brought to its thousands of users that had to decide what version to follow. A more open governance model (including decision power for the users) could have avoided this situation in the first place.

In this proposal we aim at developing original research contributions for each one of these challenges.

Research Agenda

Disrupting (open source) software development implies shifting our main focus of attention from the analysis of code aspects in the software repository to the analysis of the people behind that code, either as developers, owners or users. Therefore, the main goal of this project can be stated as building:

A unified framework to transform software development into a real community-driven development process

with the benefits of a faster and higher-quality software production and, importantly, a better alignment with the needs of the community at large. The following figure tries to illustrate this change of perspective, highlighting how we go from the current developer centric view (kind of a meritocracy where only core developers have the right to decide) to a community that now collaborates together and has the tools it needs to manage this collaboration in an optimal way.

This community-driven process will be enabled by borrowing and adapting to the software development field techniques from the domains of political science, sociology (e.g. social/behavioural informatics), economics and ecology that had been studying a diverse range of communities for centuries, and combining them with core software techniques for mining of software repositories, constraint solving [35] and language design, among several others.

More precisely, this main goal will be implemented through the following specific subgoals aimed at helping projects to: (G1) open all aspects of the project, defining a precise governance model setting up the foundations of this participative process, (G2) bring more participants in and diversify their profiles and (G3) optimize how they all collaborate together, regardless their role. All this considering that (G4) projects do not thrive in isolation but are part of a project network. The final goal (G5) is to integrate all these techniques in one single unified community-driven development platform built as an extension of current code hosting services. A more detailed description and decomposition of each subgoal follows:

G1: Bring Transparency and Democracy to OSS development

Open source communities are not as open as they seem as discussed above. Indeed, lack of transparency and anti-democratic practices can scare away potential contributors/users[8] and hamper the project’s alignment with the their needs. To overcome this situation we propose to:

Employ software mining techniques to conduct a systematic study of current governance models in OSS projects. Complement it with interviews to project members to better understand the reasons behind those choices. Develop a domain-specific language[9] to enable OSS projects precisely define their governance model extending the basic strategies covered in [36]. Given their explicit definition, rules could even be automatically enforced and its execution registered for future traceability (e.g. who voted for this at that moment in time?). Adapt different democracy models (representative, direct, liquid, …) and other political systems to the specific context of OSS to empirically test the best model for OSS projects, depending on the project characteristics. Assist projects transition to more democratic practices, if so desired by them. This may involve for instance the automatic suggestion of possible internal leaders (based on their repository activity) to represent groups of users in elections for intermediate technical committees in a representative democracy scenario. Aspects like the Gini index [37] for equality distribution and the quality of the online deliberation, inspired from[38] will also play a role.

G2: Attract new contributors to OSS projects

OSS projects need contributors to progress [39][40]. A few large projects, like Linux, may rely on paid contributors but most depend on convincing external people to volunteer their time. Given that simpler strategies, like making the project more popular, are not enough [41], we propose to:

Develop goal models [42] for each participant profile in OSS to better understand their motivations. Propose innovative contribution models. We believe OSS can be regarded as an example of a matching market (markets where money is not the main factor [43]) and therefore we can adapt retribution strategies successful in other matching markets to the OSS one. Examples would be to replicate the idea of time banks or donor chains (I help you if you help somebody that can help me). Apply gamification[10] principles to OSS to increase the level of contribution of current members. Identify potential new contributors that have the skills a OSS project is looking for by analyzing and cross-profiling people’s public profiles and behavior in social networks reusing expert finding techniques like [44] [45] [46], [47]. This may also be used to reduce the gender gap [48] and increase team diversity.

G3: Optimize internal project collaborations

Effective collaboration requires more than setting up theoretical good conditions for it. A continuous monitoring of the community structure and the exchanges taking place among its members would allow detecting and fixing early on possible bottlenecks in the communication. In particular we propose to:

Visualize the community network as a typed directed multigraph (where edges would denote several kinds of interactions between the members) and adapt well-known graph-based algorithms to identify subcommunities, leaders, low density areas and so on. Then project owners can react to solve this, e.g. by “building bridges” between the subcommunities or inviting people to specially scarce areas in the project. Define acceptable thresholds and ranges for some social metrics in OSS (e.g. bus factor [49] or the ratio between external and internal contributors) depending on the project size and domain to evaluate the “health” of the community. The ranges would come from the analysis of a representative set of “successful” projects and typical values in other fields like human ecology. Adapt review aggregator and sentiment analysis techniques to summarize long conversational exchanges to facilitate in order to let everybody easily follow relevant project discussions.

G4: Take Cross-project dependencies into account

Projects do not grow in isolation. All the dimensions described above need to be extended to deal with cross-project interactions since project dependencies take place not only at the technical level but at the human level [50]: projects compete for the same resources (e.g. developers’ time) and have cascade effects on each other. I will model this as a constraint optimization problem [51] aimed at finding an optimal assignment of resources to projects.

G5: Building a community-driven software development platform

All techniques described above will be implemented and released as part of an online collaborative platform. Once built, this platform will enable a software community at large to effectively participate in the development process according to the practices and principles developed in the project. The platform will be built by ourselves as part of the project but following the “eat your own dog food” principle, it will also be released as an open source project in itself and therefore open to contributions and suggestions from the open source community. To avoid reinventing the wheel, the platform will be built on top of GitHub (or another similar hosting platform) and provide connectors with external add-ons (e.g. forums, mailing lists, external bug trackers) to be used as additional information sources for the analysis tasks of the project.

Timing and adequacy of the proposal

Open source is reaching its tipping point[11] where, more than ever, even the most powerful tech companies and entrepreneurs are embracing open source [52] while the number of projects grows exponentially (GitHub went from 10M projects to over 30 in two years) alongside their impact on the global economy and society. And the OSS community itself is quickly realizing that at this scale, better collaboration is a must (e.g. see this open letter [53] to GitHub promoted by a group of maintainers of OSS projects frustrated with the limited collaboration capabilities of the platform).

This justifies the importance of this research proposal even if it is a challenging one due to its multidimensional and cross-disciplinary perspective, that requires mixing a wide variety of research techniques coming from both the software realm and social sciences. This increases the risk of the project but at the same time opens the door to promising novel research works in the intersection of several areas. I believe I am in a unique position to take this opportunity given:

My broad range of research interests and background (in software modeling [54], including goal modeling [55], formal methods [56][57], software analysis and mining [41][58][59], domain specific languages [60] and different kinds of empirical studies e.g. [61], to give a few examples ) covering the skill set required by the project.

My preliminary work on some of the research topics, e.g. the first version of a specific language for governance of OSS projects [36] or our study of the problems in attracting contributors [41], plus expertise on conducting research on software mining and the GitHub platform (e.g. [58], [59]).

My long term interest in several open source communities. Beyond GitHub, we are deeply involved in the Eclipse open source community (see [62]) and I am personally involved in the WordPress ecosystem [63].

My research environment is specially suited to conduct interdisciplinary research (see the risks section)

Impact

Achieving the above goals in CODE will benefit the whole software development community and our society in general. Users/citizens are empowered to have a more active participation and influence in the project evolution; contributors know in advance how their effort will be evaluated and dealt with; and project owners get the tools to attract more contributors and better manage the community to speed up the development process. But CODE will also benefit other communities. Here we describe the potential impact of CODE in and beyond OSS development:

Scientific impact: Transforming software development. The techniques developed in the project will have a substantial impact in the way that software projects are developed, analyzed and evaluated and will shed some light on the reasons why some projects are successful while others are not. I am confident that this project can open a new area of research where more and more knowledge from other completely different fields is deemed useful in Software Engineering and brought to it, something that so far has been done only occasionally.

Impact in proprietary software development. Private companies can benefit from many of the techniques developed as part of this, e.g. to evaluate the performance of their employees or get feedback from users. In fact, it has been shown that adopting OSS practices, a process called inner source, is beneficial for companies [64].

Outside the software world: impact on organizations. The work on formalization and monitoring of governance models (goal 1) is of interest for any kind of organization that wants to be transparent. Moreover, many of the social analysis techniques (goal 3) could be easily redefined to be applied on other communication platforms (e.g. forums, email threads) and not just on software-specific repositories. For instance, modeling the governance of NPO/NGO organizations could help us evaluate and compare their openness. Same for political parties and even countries.

Helping other research projects. A key long-term impact of the project should be its contribution to accelerate the advance of research in the field. Therefore, as part of the project, I will have as explicit goal the development of a series of artefacts useful to other research teams. For example, we will develop a representative sample builder [65] of projects in GitHub to be used as a benchmark when comparing results of different research works.

Methodology & risk assessment

CODE will adhere to the Design-Science Research (DSR) paradigm [66]. DSR is a problem-solving paradigm for activities dealing with the construction and evaluation of technology artifacts as well as the development of their associated research theories. Besides, CODE will make extensive use of empirical research methods both quantitative (e.g. in the automatic mining of repositories) and qualitative (e.g. semi-structured interviews to gather the motivation and requirements of participants in OSS projects and validate the results). The project will be conducted in an incremental and iterative manner [67] where at each iteration new advances in each of the project goals will be achieved. Validation of project advancement will be performed at the end of each iteration via the practitioners board (see “Resources” section) and via the automatic measurement of pre and post values of a number of metrics for a set of benchmark projects (both existing and created from scratch to be used as guinea pigs) monitored during the full duration of CODE.

Sketch of the work plan.

This four-year project will be divided as follows. An initial work package (WP0) will setup the project infrastructure and compile the initial set of projects to be used as benchmark. WP1-5 will focus on goals 1-5 above. respectively. Dissemination of results (WP6) will be an ongoing activity. This simplified Gantt diagram summarizes the work plan:





Risk assessment

This research project has an interdisciplinary nature and covers a broad spectrum of techniques which clearly increase its inherent risks. Nonetheless, my profile and that of my research environment makes us a good fit for this project (see sect. 4) and will contribute to mitigate those risks and ensure the project’s viability. Main risks and mitigation measures:

Broad range of research techniques required to accomplish the project goals (Probability: Low / Impact: Low). I have some previous experience with all the required techniques. Other members of the team will contribute also their strong technical skills in some of these areas minimizing this risk.

Cross-disciplinary nature of the project (Probability: Low / Impact: Medium). My institution’s name is “Internet Interdisciplinary Institute”, meaning that it has interdisciplinarity at its heart and favours as much as possible cross-domain scientific exchanges. A project like this is, then, a perfect fit for the institution and its strengths, and will have its complete endorsement and network of researchers to complement our skills and knowledge.

Dependency from open source repositories to get the data needed for the analysis (Probability: Low / Impact: Low). The project has a technical dependence to GitHub as the dominant code hosting platform nowadays. However, if GitHub decides to close down or change its business model, others (Bitbucket, Google code,…) will immediately take the opportunity to fill this market and we could easily adapt to their platforms to continue the project.

Little engagement of the OSS community, especially to test and validate the results of our research (Probability: Low / Impact: Medium). I have been able to recruit industrial participants in the past using my blog as a medium. We can also ensure the involvement of our many contacts in the GitHub, WordPress[63] and Eclipse[62] communities. Besides, we are already discussing (e.g.[12]) these research ideas in the open to gauge the interest of the community (also clearly expressed in this kind of initiatives, e.g. [53]) and learn their main concerns.

Resources & budget

I, as PI, will dedicate 70% of my time to CODE during the whole length of the project and will benefit from the support of my research team (ten members right now). Additionally, and given the cross-disciplinary nature of the project, I have assembled a scientific advisory board with experts from the areas of political science, sociology, psychology and ecology to have regular discussions on the project status and evolution. These are local experts from my affiliated institutions with whom I have already discussed this proposal and have confirmed their interest in joining the advisory board. Also, a professional advisory board with participants with different roles in relevant OSS projects will be constituted with over 20 volunteers recruited already. Beyond monitoring the evolution of the project and giving their opinion on it, their mission will be to validate and apply on their projects the outcomes of CODE.

The total budget requested is 1.599.697,53€, covering the hiring of 3 postdocs and 3 PhD students (mixing computer science and social science profiles in both categories) and 2 technicians for the duration of the project plus funding for research stays, trips for presenting results, event organizations and equipment.

Footnotes

[1] Report of an industry expert group invited by the European Commission to give their advice on the European software strategy ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/ssai/European_Software_Strategy.pdf

[2] This is also a key principle of agile methodologies that have been massively adopted by software teams in the last years but at a small scale.

[3] GitHub is the most used web-based collaborative development platform for OSS projects, offering a series of services, like issue trackers and access-control user management, on top of free Git repository for version control and now hosting over 30 million projects

[4] Full list of analyzed projects: https://docs.google.com/spreadsheets/d/1q4z6Z1iNcHCuBbznFK3xZ-fDu8UXp5-sjHF2IqWgmq0/edit?usp=sharing

[5] A governance model describes the roles that project participants can take on and the process for decision making within the project (OSS watch)

[6] We are not implying that all OSS projects should be democratic but we strongly believe that this is an aspect that deserves attention.

[7] A fork happens when a group of developers take a copy of the source code of a project and use it to create an independent version of the original project, evolving independently (and therefore at the risk of causing a split in the community behind the project if not merged back later on).

[8] Even if, for whatever reason, a certain project is NOT looking for contributors, stating this clearly (transparency) would avoid misunderstandings.

[9] A domain-specific language (DSL) is a language specifically designed to express solutions to problems in a specific domain. This is in contrast with general languages (like Java or UML) that aim to be used in any domain.

[10] Gamification: Use of game elements (like badges, points or levels) in serious environments

[11] Tipping point: a point in time when a group rapidly and dramatically changes its behavior by widely adopting a previously rare practice [68][69]

[12] https://news.ycombinator.com/item?id=10908978

[13] Number of citations, h-index and i10 index data taken from Google Scholar. Citations include self-citations.

[14] Only 11 of those 143 publications co-authored with my thesis supervisor

[15] Our tools publicly available on GitHub: https://github.com/SOM-Research

References

[1] R. Schuwer, M. van Genuchten, and L. Hatton, “On the Impact of Being Open,” IEEE Software, vol. 32, no. 5, pp. 81–83, Sep. 2015.

[2] E. S. Raymond, The Cathedral and the Bazaar. O’Reilly Media, 2001.

[3] A. Sutcliffe and N. Mehandjiev, “End-user development: tools that empower users to create their own software solutions – Special issue,” Communications of the ACM, vol. 47, no. 9, p. 31, Sep. 2004.

[4] C. M. Schweik and R. C. English, Internet Success: A Study of Open-Source Software Commons. The MIT Press, 2012.

[5] F. P. . J. Brooks, “No Silver Bullet Essence and Accidents of Software Engineering,” Computer, vol. 20, no. 4, pp. 10–19, Apr. 1987.

[6] C. Bird, P. Rigby, and E. Barr, “The promises and perils of mining git,” in 6th International Working Conference on Mining Software Repositories, 2009, pp. 1–10.

[7] J. Howison and K. Crowston, “The perils and pitfalls of mining SourceForge,” in Proc. of Workshop on Mining Software Repositories, 2004, pp. 7–11.

[8] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “The promises and perils of mining GitHub,” in 11th Working Conference on Mining Software Repositories, 2014, pp. 92–101.

[9] B. Vasilescu, A. Serebrenik, and V. Filkov, “A Data Set for Social Diversity Studies of GitHub Teams,” in 12th Working Conference on Mining Software Repositories, 2015, pp. 514–517.

[10] T. F. Bissyande, F. Thung, D. Lo, L. Jiang, and L. Reveillere, “Popularity, Interoperability, and Impact of Programming Languages in 100,000 Open Source Projects,” in 37th Annual IEEE Computer Software and Applications Conference, 2013, pp. 303–312.

[11] P. Mayer and A. Bauer, “An empirical analysis of the utilization of multiple programming languages in open source projects,” in 19th International Conference on Evaluation and Assessment in Software Engineering, 2015, no. November, pp. 1–10.

[12] C. Vendome, “A Large Scale Study of License Usage on GitHub,” in 37th IEEE/ACM International Conference on Software Engineering, Volume 2, 2015, pp. 2–4.

[13] C. Vendome, M. Linares-Vásquez, G. Bavota, M. Di Penta, D. German, and D. Poshyvanyk, “License usage and changes: A largescale study of java projects on github,” in ICPC conf., 2015, pp. 218–228.

[14] J. Zhu, M. Zhou, and A. Mockus, “The Relationship Between Folder Use and the Number of Forks : A Case Study on Github Repositories,” in 2014 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, 2011, p. 30.

[15] R. Coleman and M. a. Johnson, “Power-Laws and Structure in Functional Programs,” in 2014 International Conference on Computational Science and Computational Intelligence, 2014, pp. 168–172.

[16] K. Achuthan, S. Sudharavi, R. Kumar, and R. Raman, “Security Vulnerabilities in Open Source Projects : An India Perspective,” in 2nd International Conference on Information and Communication Technology, 2014, pp. 18–23.

[17] P. S. Kochhar, T. F. Bissyande, D. Lo, and L. Jiang, “Adoption of Software Testing in Open Source Projects–A Preliminary Study on 50,000 Projects,” in 17th European Conference on Software Maintenance and Reengineering, 2013, pp. 353–356.

[18] R. Pham, L. Singer, O. Liskin, F. F. Filho, and K. Schneider, “Creating a shared understanding of testing culture on a social coding site,” in 35th International Conference on Software Engineering, 2013, pp. 112–121.

[19] G. Destefanis and M. Ortu, “Position Paper : Are Refactoring Techinques Used by Developers ? A Preliminary Empirical Analysis,” in REFTEST workshop, 2014.

[20] M. Pinzger and A. Van Deursen, “An Exploratory Study of the Pull-based Software Development Model,” in 36th International Conference on Software Engineering, 2014, pp. 345–355.

[21] Y. Yu, H. Wang, V. Filkov, P. Devanbu, and B. Vasilescu, “Wait For It: Determinants of Pull Request Evaluation Latency on GitHub,” in 12th IEEE/ACM Working Conference on Mining Software Repositories, 2015, pp. 367–371.

[22] A. Lima, L. Rossi, and M. Musolesi, “Coding together at scale: GitHub as a collaborative social network,” in 8th AAAI International Conference on Weblogs and Social Media, 2014, pp. 295–304.

[23] B. Vasilescu, V. Filkov, and A. Serebrenik, “Perceptions of Diversity on GitHub : A User Survey,” CHASE Workshop, 2015.

[24] M. Y. Allaho and W.-C. Lee, “Trends and behavior of developers in open collaborative software projects,” in 2014 International Conference on Behavior, Economic and Social Computing, 2014, pp. 1–7.

[25] P. Loyola and I.-Y. Ko, “Biological Mutualistic Models Applied to Study Open Source Software Development,” in 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2012, vol. 1, pp. 248–253.

[26] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “An in-depth study of the promises and perils of mining GitHub,” Empirical Software Engineering, Sep. 2015.

[27] M. Y. Allaho and W.-C. Lee, “Trends and behavior of developers in open collaborative software projects,” in 2014 International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC2014), 2014, pp. 1–7.

[28] J. Marlow, L. Dabbish, and J. Herbsleb, “Impression Formation in Online Peer Production : Activity Traces and Personal Profiles in GitHub,” in 16th ACM Conference on Computer Supported Cooperative Work, 2013, pp. 117–128.

[29] J. Marlow and L. Dabbish, “Activity traces and signals in software developer recruitment and hiring,” in 16th ACM Conference on Computer Supported Cooperative Work, 2013, pp. 145–156.

[30] F. Fagerholm, A. Sanchez Guinea, J. Borenstein, and J. Munch, “Onboarding in Open Source Projects,” IEEE Software, vol. 31, no. 6, pp. 54–61, Nov. 2014.

[31] F. Thung, T. F. Bissyande, D. Lo, and L. Jiang, “Network Structure of Social Coding in GitHub,” in 17th European Conference on Software Maintenance and Reengineering, 2013, pp. 323–326.

[32] J. Tsay, L. Dabbish, and J. Herbsleb, “Influence of social and technical factors for evaluating contribution in GitHub,” in 36th International Conference on Software Engineering, 2014, pp. 356–366.

[33] J. Xavier and A. Macedo, “Understanding the popularity of reporters and assignees in the Github,” in 26th International Conference on Software Engineering and Knowledge Engineering, 2014, pp. 484–489.

[34] J. Jiang, L. Zhang, and L. Li, “Understanding project dissemination on a social coding site,” in 20th Working Conference on Reverse Engineering, 2013, pp. 132–141.

[35] K. Apt, Principles of Constraint Programming. Cambridge University Press, 2003.

[36] J. L. Canovas Izquierdo and J. Cabot, “Enabling the Definition and Enforcement of Governance Rules in Open Source Systems,” in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, 2015, vol. 2, pp. 505–514.

[37] L. Ceriani and P. Verme, “The origins of the Gini index: extracts from Variabilità e Mutabilità (1912) by Corrado Gini,” The Journal of Economic Inequality, vol. 10, no. 3, pp. 421–443, Jun. 2011.

[38] D. Friess and C. Eilders, “A model for assessing online deliberation. Towards a more complex approach to measure and explain deliberativeness online,” in The Internet, Policy & Politics Conferences, 2014.

[39] L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, “Social coding in github: transparency and collaboration in an open software repository,” in 15th ACM Conference on Computer Supported Cooperative Work, 2012, pp. 1277–1286.

[40] R. Padhye, S. Mani, and V. S. Sinha, “A study of external community contribution to open-source projects on GitHub,” in Proceedings of the 11th Working Conference on Mining Software Repositories – MSR 2014, 2014, pp. 332–335.

[41] J. L. Cánovas Izquierdo, V. Cosentino, and J. Cabot, “Popularity will NOT bring more contributions to your OSS project,” Journal of Object Technology, vol. 14, no. 4, 2015.

[42] A. van Lamsweerde, “Goal-oriented requirements engineering: a guided tour,” in 5th IEEE International Symposium on Requirements Engineering, 2001, pp. 249–262.

[43] A. E. Roth, Who Gets What — and Why: The New Economics of Matchmaking and Market Design. Eamon Dolan/Houghton Mifflin Harcourt, 2015.

[44] A. Bozzon, M. Brambilla, S. Ceri, M. Silvestri, and G. Vesci, “Choosing the right crowd,” in Proceedings of the 16th International Conference on Extending Database Technology – EDBT ’13, 2013, pp. 637–648.

[45] F. Wiedemann, R. Sontag, and M. Gaedke, “NeLMeS: Finding the Best Based on the People Available Leveraging the Crowd,” in 15th International Conference on Web Engineering, 2015, vol. 9114, pp. 687–690.

[46] B. Vasilescu, V. Filkov, and A. Serebrenik, “StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge,” in 2013 International Conference on Social Computing, 2013, pp. 188–195.

[47] L. Singer, F. Figueira Filho, and M.-A. Storey, “Software engineering at the speed of light: how developers stay current using twitter,” in 36th International Conference on Software Engineering, 2014, pp. 211–221.

[48] D. N. Beede, T. A. Julian, D. Langdon, G. McKittrick, B. Khan, and M. E. Doms, “Women in STEM: A Gender Gap to Innovation,” Economics and Statistics Administration, no. Issue Brief No. 04–11, Aug. 2011.

[49] V. Cosentino, J. L. C. Izquierdo, and J. Cabot, “Assessing the bus factor of Git repositories,” in 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, 2015, pp. 499–503.

[50] T. Mens and P. Grosjean, “The Ecology of Software Ecosystems,” Computer, vol. 48, no. 10, pp. 85–87, Oct. 2015.

[51] K. Apt, “Principles of Constraint Programming,” Sep. 2003.

[52] C. Metz, “Open Source Software Went Nuclear This Year | WIRED,” Wired, 2015.

[53] “Dear GitHub – An open letter from the maintainers of open source projects.” [Online]. Available: https://github.com/dear-github/dear-github.

[54] M. Brambilla, J. Cabot, and M. Wimmer, Model-Driven Software Engineering in Practice, vol. 1. Morgan & Claypool Publishers, 2012.

[55] H. C. Esfahani, J. Cabot, and E. Yu, “Adopting agile methods: Can goal-oriented social modeling help?,” Research Challenges in Information Science (RCIS), 2010 Fourth International Conference on, 2010.

[56] J. Cabot, R. Clarisó, and D. Riera, “On the verification of UML/OCL class diagrams using constraint programming,” Journal of Systems and Software, vol. 93, pp. 1–23, Jul. 2014.

[57] C. A. González and J. Cabot, “Formal verification of static software models in MDE: A systematic review,” Information and Software Technology, vol. 56, no. 8, pp. 821–838, Aug. 2014.

[58] V. Cosentino, J. L. C. Izquierdo, and J. Cabot, “Assessing the bus factor of Git repositories,” in 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2015, pp. 499–503.

[59] V. Cosentino, J. L. Cánovas Izquierdo, and J. Cabot, “Gitana: A SQL-Based Git Repository Inspector,” in 34th International Conference on Conceptual Modeling, ER 2015, 2015, vol. 9381, pp. 329–343.

[60] R. Tairas and J. Cabot, “Corpus-based analysis of domain-specific languages,” Software & Systems Modeling, vol. 14, no. 2, pp. 889–904, Jun. 2013.

[61] D. Ameller, C. Ayala, J. Cabot, and X. Franch, “Non-functional Requirements in Architectural Decision Making,” IEEE Software, vol. 30, no. 2, pp. 61–67, Mar. 2013.

[62] H. Brunelière and J. Cabot, “On Developing Open Source MDE Tools: Our Eclipse Stories and Lessons Learned,” in OSS4MDE@MoDELS 2014, 2014, pp. 9–19.

[63] J. Cabot, “Looking at WordPress through the eyes of a Software Researcher.” WordCamp Europe, 2015.

[64] K.-J. Stol and B. Fitzgerald, “Inner Source–Adopting Open Source Development Practices in Organizations: A Tutorial,” IEEE Software, vol. 32, no. 4, pp. 60–67, Jul. 2015.

[65] M. Nagappan, T. Zimmermann, and C. Bird, “Diversity in software engineering research,” in Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2013, p. 466.

[66] A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,” MIS Quarterly, vol. 28, no. 1, pp. 75–105, Mar. 2004.

[67] C. Larman and V. R. Basili, “Iterative and incremental developments. a brief history,” Computer, vol. 36, no. 6, pp. 47–56, Jun. 2003.

[68] M. Gladwell, The Tipping Point: How Little Things Can Make a Big Difference. Back Bay Books, 2002.

[69] Thomas C. Schelling, Micromotives and Macrobehavior. W. W. Norton & Company, 2006.

Featured image by Ron Rothbart