2.5 quintillion bytes of data are created every day. It’s created by you when you’re commute to work or school, when you’re shopping, when you get a medical treatment, and even when you’re sleeping. It’s created by you, your neighbors, and everyone around you. So, how do we ensure it’s used ethically?

Back in 2014, before I entered public service, I wrote a post called Making the World Better One Scientist at a Time that discussed concerns I had at the time about data. What’s interesting, is how much of it is still relevant today. The biggest difference? The scale of data and coverage of data has massively increased since then and with it the opportunity to do both good and bad.

In the bucket of good. We’re finding incredible insights using data to develop tailored medical treatments (Precision Medicine). Recently a data scientist at the Data Science for Social Good Program at the University of Chicago used machine learning/artificial intelligence to automatically detect bridges from satellite images that have been flooded for first responders. Crisis Text Line has been literally saving lives every day through an all volunteer network of counselors with powerful data and technology superpowers to help those in crisis. And through the Data-Driven Justice Initiative we’ve seen local counties be able to get their populations that need mental help and drug treatment out of our overcrowded jails and into the facilities though the safe sharing of data. These solutions not only save money they are a proven success.

I could go on and on about all of the amazing work that is happening around the world using data to make lives better everyday, but we also have to address where data is causing more harm than good. As Propublica has shown, algorithms are being used in the courtroom to make decisions that have an adverse impact on race. We know that data used in predictive policing can reinforce traditional stereotypes. And my friend Cathy O’Neil documents many more cases in her great book Weapons of Math Destruction. Let’s not forget about people stealing our data. From healthcare breaches to data brokers, we have systems holding on to our most sensitive data with minimal oversight and protections. And finally, our democratic systems have been under attack using our very own data to incite hate and sow discord.

With the old adage that with great power comes great responsibility, it’s time for the data science community to take a leadership role in defining right from wrong. Much like the Hippocratic Oath defines Do No Harm for the medical profession, the data science community must have a set of principles to guide and hold each other accountable as data science professionals. To collectively understand the difference between helpful and harmful. To guide and push each other in putting responsible behaviors into practice. And to help empower the masses rather than to disenfranchise them. Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see.

So how do we do it? First, there is no single voice that determines these choices. This MUST be community effort. Data Science is a team sport and we’ve got to decide what kind of team we want to be.

To start we need to engage in conversation and spend much more time talking about the changes that are about to take place (to those who have been doing this, thank you!).

That’s why I’m excited about the opportunity for the ENTIRE data science community to take part in helping define what a Code of Ethics for data sharing would look like for data scientists. How do you get involved?