Thomas Vincent was a Fellow in the first session of the Data Science Fellows Program held in New York City. After Insight he worked as a data scientist at Dow Jones with teams ranging from the newsroom to the marketing department. Tom recently took on a new role, as a Data Science Engineer at DigitalOcean. We asked him to reflect on his experiences before, during, and after Insight.

Tom, your background is in biostatistics. Can you say a bit more about what you worked on?

My PhD focused on understanding the probabilistic rules by which a “flat” linear chain of amino acids somehow folds into a very specific three-dimensional structure in order to become a fully functional protein within the human body. During my postdoc I transitioned to genomics, working on several different projects: inferring time-varying gene regulatory networks, consulting with medical and surgical staff at Weill Cornell, and lung cancer biomarker prediction.

It sounds much more applied than I’d imagined!

Sometimes, in moments of sheer madness, I can convince myself that I contributed to pushing the field further [smiles]. I was fortunate enough to work under the supervision of both an applied professor (from biochemistry) and a theoretical professor (from statistics). This early exposure to interdisciplinary studies allowed me to pursue my interest in Bayesian statistics, plus I got to hang with the cool kids in the stats department. It also taught me the invaluable skill of communicating abstract results to a sometimes non-technical audience. That was reinforced even more during my postdoc, which I did at a medical institution.

What was initially attractive to you about data science?

First and foremost: the data and the science. I knew there was a wealth of amazing data ready for the taking, and many exciting ways to analyze it, and I had obviously heard of the buzz around it.

I attended some meetups and reached out to a few people, and was quickly sold by the general enthusiasm and pace at which things were moving, which was very different from academia.

What made you decide to apply to Insight?

First I realized that it would offer a great foot in the door to interesting companies where your cv would normally end up in a stack of paper. Second, I was intrigued by the prospect of working closely for seven weeks with twenty-plus like-minded, smart people, and thought I could learn a lot from that — which I did! Third, I thought it would be a great opportunity to build a solid network of friendly data scientists. And finally, it just sounded really fun.

I was intrigued by the prospect of working closely for seven weeks with twenty-plus like-minded, smart people, and thought I could learn a lot from that — which I did!

What kind of technical work had you done before applying?

In my academic life I spent most of my time scraping, aggregating, cleaning, normalizing, modeling, and visualizing what always started off as very messy data. And if the data wasn’t messy, then it was probably wrong.

This included a lot of shell scripting, R, Python, and C++. On the more theoretical side of things, I had spent a lot of time working with hypothesis testing, supervised and unsupervised learning, so I felt pretty comfortable in those areas.

What preparation did you do before arriving at Insight?

What I probably lacked the most was both SQL and CS knowledge, so around four weeks prior to the start of the Insight program I set out to learn SQL — badly — and read the entire Data Structures and Algorithms with Python book, which I ended up absolutely loving.

In hindsight, I wish I had put some more time in front and back-end development, tools like D3 and perhaps some fundamental software engineering concepts. I’ve actually come to think that JavaScript is an invaluable tool for data scientists looking to go beyond the simple modeling aspect of things.

How did you get into skills-based volunteering?

I have been fortunate enough to be born and raised in an environment in which high-end education was readily available and where I was allowed to choose my own path. As a result, there is a willingness to capitalize on the chances I’ve been offered and give back in any way I can.

While I can’t sponsor major organizations or be some kind of celebrity spokesperson, I contribute in the best way I can, by offering the computational and statistical knowledge that I have gained to those that can benefit the most out of it.

I contribute in the best way I can, by offering the computational and statistical knowledge that I have gained to those that can benefit the most out of it.

How did you get connected with DataKind specifically?

I had been browsing online looking for volunteering opportunities, and randomly came across the DataKind website one day. So I applied through the application link on their website.

Not too long after I received a couple of emails and had a phone screen to determine if I would be a good fit for both DataKind and some of the projects they currently had running.

What DataKind projects have you been involved with?

I am part of a team that works in tandem with members of a non-profit organization called iCouldBe. Their principal goal is to leverage the power of online mentoring to assist middle and high-schoolers from less than privileged background or neighborhood. Students get connected with volunteers online for one-on-one mentorship.

As all the interactions occur exclusively via online messaging, we are left with a fair amount of text-based conversations, which iCouldBe was very interested in taking a deeper look at. More specifically, they wanted to understand and explain the degree of success of their curriculum, the measure of engagement of students, and if some students could be bucketed in behavior-specific segments.

What’s been your role on the team?

My official title is “Data Ambassador”. On paper, this means that I bridge the gap between the data experts on the team and iCouldBe, although I have found my role to be a lot more elastic than that.

First, the two data experts are more than capable enough of handling themselves without me! And I have found myself doing some data analysis, preparing presentations, having off-hours conversations with the iCouldBe team to understand their exact needs and requirements, and sanity-checking some code.

The project is now winding down and overall it was very successful. We ended up building a data product, which gives iCouldBe a reliable pipeline into which they can add any new data and automatically generate relevant insights. This means that we will continue to have an impact on their work as they collect more data over the coming years.

[ED. NOTE: YOU CAN READ MORE ABOUT THE ICOULDBE DATAKIND PROJECT PLUS SEE A FEW PHOTOS OF TOM AND HIS TEAM PRESENTING THEIR WORK HERE.]

What have you learned from this work that augments your day job?

There have been two major learning points during my time with DataKind. The first was realizing that the initial requests for the project were unclear, and it took some time to figure out the delicate balance between what we, the data scientists, thought was cool, and what the iCouldBe team actually needed.

[My first major learning point] was realizing that the initial requests of the project were unclear, and it took some time to figure out the delicate balance between what we, the data scientists, thought was cool, and what the iCouldBe team actually needed.

The second was establishing some kind of order within the team in terms of communication channels. Everyone in the team is incredibly nice and competent, but probably also too polite, which means that we may have been a bit slow to “take the horse by its reins”. I learnt that sometimes it’s not a bad thing to take control if things are going a little off-track.

What kind of work have you done as a data scientist since Insight?

To echo the previous point regarding the pace of data science work, I was involved with many projects at Dow Jones, nearly one per month, with a lot of overlap of course. The work ranged from building stand-alone data products that run graphical modeling pipelines in order to produce fully interactive visualizations of risk and compliance networks, building APIs running search engines or predictive models for relationship extraction, projects looking to further our understanding of customers, to creating pipelines to ingest news events for the newsroom.

And what about in your current role at DigitalOcean?

While my work is still heavily focused on data science, members of the team are also expected to “bring our own engineering”. In essence, this means that we are integrated in every part of the process and take ownership of whatever we are building, from data collection all the way to being on-call. This is a role and level of responsibility that I was actively seeking so I am very happy with where I am.

It is also interesting to contrast life at a large, Fortune 500 company, to being at a startup in hyper-growth. It is easier to positively impact the company, since the hierarchy is a lot flatter and there is far more low-hanging fruit up for grabs.

Thanks for taking the time to share your experiences with us Tom!