The Computing Community Consortium (CCC) has been working hard on various white papers over the past couple of months and slowly releasing them. You can see all of them here.

Today, we highlight another paper, called Big Data, Data Science, and Civil Rights by Solon Barocas, Elizabeth Bradley, Vasant Honavar, and Foster Provost.

Government, academia, and the private sector have increasingly recognized that the use of big data and data science in decisions has important implications for civil rights. However, a coherent research agenda for addressing these topics is only beginning to emerge and the need for such an agenda is critical and timely. Big data and data science have begun to profoundly affect decision making because the modern world is more broadly instrumented to gather data—from financial transactions, mobile phone calls, web and app interactions, emails, chats, Facebook posts, Tweets, cars, Fitbits, and on and on.

According to this paper, the necessary research agenda should include:

Determining if models learned from data exhibit objectionable bias The selection of the data used to build the models is an important source of potential bias. While data scientists often learn about the challenges posed by sampling bias, the difficulty of establishing ground truth, and the many ways to measure model performance, there is an urgent need to support research to develop more rigorous methods for establishing whether a model exhibits objectionable bias.

Supporting the emerging field of fairness-aware machine learning Computer scientists have begun to investigate how concerns with fairness and reducing or eliminating unwanted discrimination might become part of the model-building process. Investment should provide the resources to support and foster these discussions and to push the field to develop tools that make clear the full range of possibilities for defining and achieving fairness. Future research will also need to consider how organizations would deploy these methods in practice.

Looking beyond the algorithm for the sources of unfairness, discrimination, etc. The technical formulation of the problem is one very crucial aspect of data-driven decision-making that is only just beginning to be taken into account in discussions of and research into ethics, data science, and civil rights. A robust understanding of the ethical use of data-driven systems needs substantial focus on the possible threats to civil rights that may result from the formulation of the problem.

Creating cross-disciplinary scholars Work integrating civil rights and data science cannot be easily divided between collaborators. Valuable breakthroughs are most likely to come from researchers who combine expertise in both domains. Future investment in research should foster collaborations that do more than put different communities in contact; investment should support the training necessary to cultivate a future generation of researchers.



See the full report to learn more about the necessary research agenda needed to address big data and civil rights.