We recently caught up with Abe Gong, Data Scientist at Jawbone and thought-leader in the Data Science community. We were keen to learn more about his background, his work at Jawbone and his latest side projects - including thought-provoking insights on how the ROI on Science is evolving ...



Hi Abe, firstly thank you for the interview. Let's start with your background and some of your early public policy related work...

Q - What is your 30 second bio?

A - I'm a hybrid social/computer scientist - interested in human problems, and how the right computational systems can sometimes solve them. I studied communications at BYU, then public policy, political science, and complex systems at the University of Michigan. I'm currently a data scientist at Jawbone, working on the UP fitness tracker. Practically speaking, that means I get to spend my time building data systems to nudge people to form good habits and live healthier.



Q - How did you go from Communications to Public Policy to Data Science?

A - I feel like I've always done data science - we just didn't call it that until recently. I've been writing code since I was 10, and my side jobs and internships were always data-related. Comms taught me formal statistics, with application to marketing and PR. Public policy extended that training to government and policymaking. My PhD followed up with a stiff dose of web scraping, natural language processing, and research design. The subject domains sound different, but the core skills are very similar. When data science became a thing a couple years ago, I said, "Great! Now there's a name for this kind of work!"



Q - Was there a specific "aha" moment when you realized the power of data?

A - My senior capstone project in PR and market research was a statewide survey of military families in Utah. This was 2005 and 2006, so the first deployments to Iraq and Afghanistan were just ending. For the first time in 30 years, lots of soldiers were coming home with PTSD and other combat-related issues. No one really knew how to cope with it, and the strain was tearing up a lot of people and families.



I worked in the call center and did most of the data analysis for the project. As I talked to the officials in the military and the Veteran's Affairs office, I realized that our little amateur research team had the clearest picture of how deployments and PTSD were affecting the state. By collecting the right data (i.e. really listening to people), we had become the representatives for a suffering constituency with no other voice.



That was an "aha" moment for me. I think I had previously assumed that big institutions, like the VA, were basically well-informed and rational. In that project, I realized how often the right information is missing from important conversations.



Q - What attracted you to the intersection of data, politics and blogs?

A - Science always follows data. Blogs struck me as a fantastic source of readily available data, and I was sure it must be good for something. From there I worked backwards to political theories - because I was in a PoliSci program - that could be enriched by bringing in data from the blogosphere: political participation, new media, and civil discourse. That's not the way you're supposed to do it - theory and research questions are supposed to drive data collection. But it worked in this case because blog data is just so rich. (Also, my committee was very supportive. I think they were curious whether I could actually pull off the research design I'd pitched.)



Q - How do you think the future of Public Policy will look as people like you, Nate Silver and others apply a very data heavy approach?

A - Ha - that's probably the first and only time that Nate Silver and I are mentioned in the same sentence. I've moved away from direct policy work, so others (Jake Porway, Drew Conway, Matt Gee) can probably answer this question better. My sense is that open data will improve policymaking, but that progress will be slow and uneven: two steps forward, one step back. Many opportunities to improve governance through data science are going to open up in the coming years, but I don't think I have the patience to wait for them.





Very interesting background and insights - thanks for sharing! Let's change gears and talk more about Data Science and Machine Learning...



Q - What excites you most about using Machine Learning and Data Science in your professional life?

A - I want to push back on the question a little, because in my experience, machine learning is only a small fraction of data science. Case in point: I came to Jawbone a little more than a year ago as the company's first data scientist. Since then, at least half my time has been spent building infrastructure: ETL, scheduling, making sure systems scale, making sure we have the right instrumentation, making sure that other groups know to tell us before changing their data structures.



It's not the sexy part of data science, but when you get it right, everything else falls into place. Your analysis is faster and more conclusive. Your data products are more fun to build and ten times more reliable in production. A lot of data scientists miss the importance of the infrastructure layer, and that ends up seriously constraining the speed, scope, and quality of their work. Now and then my work calls for statistics/machine learning, but it's usually the last step in a long data pipeline - the icing on the cake, really. You can have a cake without icing, but not the other way around.



To your question about what's most exciting: one of my big projects right now is developing Jawbone's system for AB testing on the UP band. It's a great business intelligence asset for the company, and it's also a fantastic platform for nudging and improving user behavior. In other words, it's a great place for doing Science. We have all the same levers and tools as a growth hacking team at a typical SaaS company (content changes, UI changes, timing changes, in-app messaging, email, etc.), but our dependent variables are a lot more interesting. Instead of trying to convert/retain/upsell customers, we get to optimize for things like miles walked/run per user, sleep quality, and habit formation.



In other words, we're building infrastructure to tackle some of the big, unsolved problems in psychology and behavioral economics. I love working on these problems from a vantage point with such awesome data and reach. I also love that our relationship with users is fully collaborative - instead of trying to grab more eyeballs or induce more clicks ("Find out how this Mountain View mom makes over $6,000 a month with this one weird trick!") - we're trying to help users achieve their own lifestyle goals. There's nothing wrong with ad targeting, but I feel blessed to work on data problems with more direct human impact.



Q - That sounds fantastic! Now, while you're doing all this - what are your favorite tools/applications to work with?

A - I'm a python guy. I love ipython, pandas, scikit-learn, and matplotlib. Probably two-thirds of my workflow revolves around those tools. I used R a lot in grad school, but gave it up as I started working more closely with production systems -it's just so much easier to debug, ship, and scale python code. For backend systems, I'm agnostic. I tend to use the AWS stack for my own projects, but the right combination of streaming/logging/messaging/query/scheduling/map-reduce/etc. systems really depends on the problem you're trying to solve. In my opinion, a full-stack data scientist should be comfortable learning the bindings to whatever data systems he/she has to work with. You don't want to be the carpenter who only knows hammers.



Q - What are the biggest areas of opportunity/questions you want to tackle?

A - Habit change at scale. Habits are an awfully important part of what it means to be human, but we really don't know that much about how they work. That is, our theories of motivation, psychology, incentives, etc. don't yet explain why some habits stick and others don't. The science hasn't developed that far. That's changing, though. I'm convinced that this field is ripe for an explosion. The data is there, the commercial incentives are right, and there's enough existing social/psychological theory to prime the pump. In the next few years, I expect to see theories of habit change improve by leaps and bounds - we're talking about a minor revolution in the science of human behavior - and I'm really looking forward to being part of it.



Q - What personal/professional projects have you been working on this year, and why/how are they interesting to you?

A - I've already mentioned the stuff I'm doing at work, so let me tell you about a couple of side projects ... First, storytelling: after watching D.J. Patil's talk about how storytelling is an important skill for data scientists, I put a lot of my spare cycles into reading about, thinking about, and practicing storytelling. I learned to look for story elements in data: plot, characters, scenes, conflict, mood, etc. Often, our first instinct is to reduce data to numbers and hypothesis tests. Looking for the stories in data is another good way to make data meaningful, especially when you want users to get personally involved with the meaning-making.



I've really enjoyed exploring the craft of storytelling. It's a tradition at least as old as the scientific method, and sometimes much more powerful: you may be able to persuade individual humans without telling stories, but it is almost impossible to persuade a whole group without good storytelling - stories are the API to human culture change. I'm not sure that this is unique to data science, but it's definitely worth knowing. If others want to read up on the subject, I highly recommend Story, by Robert McKee, Save the Cat, by Blake Snyder, and Campbell's classic The Hero with a Thousand Faces - in that order.



More recently, I've been exploring a topic I call "the ROI for science." This started with a blog post speculating about how data science might evolve as a profession, branched out into a search for root causes ("Why is data science getting big now?"), and led to a fascinating thesis. Here's the gist: cheap and ubiquitous data are driving up the return on investment for many kinds of research, causing a boom in the use of scientific methods in business and day-to-day life.



Once you spot the trend, you'll start to see examples all over the place: the recent J-curve in patent filings, the growth of the hacker/maker movement, the big data infrastructure supporting scientific efforts like CERN. If we stopped with the simple trend - more Science!" - this would be a very optimistic story.



But the same premise leads to a counterintuitive corollary: as more research is driven by private investment, the benefits of science are increasingly being captured by private interests. Think of all the investment that goes into in business intelligence and operations research (and data science): many person-years and millions of dollars to develop the equivalent of a whole scientific discipline - devoted entirely to the success of a single business model. Other examples of the scientific method serving narrow interests: a growing body of industrial trade secrets that never pass into the public domain; secret surveillance technologies developed by governments; the increasing dependence of many academic researchers on datasets owned by corporations.



We're used to thinking of science as a public good - open, democratic, and freely shared - but as the ROI on science increases, we should expect far more science to be privatized. That's not necessarily bad, but it brings new risks, power relationships, and thorny ethical questions. I'm very interested in starting a conversation around these issues, including the role that data scientists can play in nudging the system in constructive directions.



Okay, stepping back. These are fun things to think and talk about - blue sky, big picture stuff. I also have some technical side projects in the works (mostly quantified self projects about goal-setting, mental acuity, and productivity) but they're not ready for prime time yet.





Very thought-provoking - will definitely be interesting to see how data availability and the data science profession influence / impact the ROI on science - and to see who gains. Look forward to following your thoughts / the conversation on this topic! Finally, it is advice time...



Q - What does the future of Data Science look like?

A - Exciting! Like I said earlier, I'm convinced we're living through a renaissance of the scientific method. There's a scary side to the new power of data, but on the whole I'm optimistic about where we're headed. Science always thrives in a data-rich environment, and the information revolution ("software eating the world") is generating a wealth of data. More and more, science is going to be something that everyone can - and to some extent, needs - to do. That's the common thread behind the appeal of data science, the quantified self movement, and the emphasis on "big data." They're all about capturing data, applying the scientific method, and making life better by making it smarter.



Abe - Thank you so much for your time! Really enjoyed learning more about your background and what you are working on now - both at Jawbone and personally. Abe's blog can be found online at http://blog.abegong.com and Abe himself is on twitter @AbeGong.



Readers, thanks for joining us!



what it takes to become a data scientist

what skills do I need

what type of work is currently being done in the field

If you enjoyed this interview and want to learn more aboutthen check out Data Scientists at Work - a collection of 16 interviews with some the world's most influential and innovative data scientists, who each address all the above and more! :)