Data scientists and engineers: Advice I would give my younger self

4 Insight alumnae share stories about how they got their start and what data science and data engineering means to them

From left, Maureen Teyssier, Lory Nunez, Kyungeun Lim, Pam Wu at a panel discussion March 20, 2018 at Columbia University

While Kyungeun Lim spent years in academia before leaving to work as a senior lead data scientist at NBC Universal, Pam Wu walked out one day after receiving her PhD.

“Sometimes I wonder if it was worth it,” said Wu, who now works as a data engineer at New York-based Enigma, a data management and intelligence company. “Was it worth doing a PhD?”

Lim and Wu were among four panelists who spoke to students at an event co-sponsored by Columbia University’s Women in Computer Science student group and Insight Data Science on March 20 in the university’s computer science department lounge in New York City.

The other panelists were Maureen Teyssier, who manages three data teams at Enigma, and Lory Nunez, who works as a data scientist and data engineer at JPMorgan. The four shared stories about their work, advice they would have for students starting their careers and the difference between data engineering and data science.

Wu started programming when she was in middle school, building parody websites using HTML, CSS and pre-jQuery Javascript code. She could have bypassed her New York University bioinformatics PhD entirely and saved herself years of school by getting a software development job much earlier in her career.

“Was that a bunch of time that was wasted?” Wu asked. “I wonder about it because if I had gone straight into [computer science] right out of high school, I would have been a developer way earlier but, on the other hand…I would not have been a data engineer. I would probably be a front-end [developer], right? So I would not have ended up here. I’d be making widgets or something.”

For Lim, who received her PhD in physics from Columbia and then went to Yale University for postdoctoral research before transitioning to industry, her path kept her in academia for years but it also allowed her to explore all of her options.

“I didn’t restrict myself,” she said, advising students to similarly not limit themselves when making career choices.

Video highlights from parts of the Columbia Women in Computing panel discussion on March 20

Advice to my younger self

Avoiding naysayers and pushing yourself to constantly learn was the first piece of advice Teyssier had for students.

“I would tell my younger self to ignore the people who tell me to stick with what you know and just build things as quickly as you can instead of learning something new,” she said. “Always learn new things, always learn how to teach yourself so that you can do that efficiently, quickly and so that you are always able to leverage new tools because it will always keep your life interesting.”

Nunez would have spent more time paying attention during some of her college classes in core topics, such as computer science fundamentals, calculus, linear algebra and basic math classes.

“Focus on the basics because the basics have a way of coming back,” she said. “Also, listen, ask questions, find mentors, implement.”

Difference between data engineering and data science

Lory Nunez

Nunez, who was the only one of the four without a PhD but who has worked in industry for longer, talked about how data engineering has emerged to be its own discipline in recent years.

“The inputs of data, especially now, it’s just so much that the problems we have now probably weren’t problems we had 10–20 years ago,” she said. “Now the growth of data is just so big that you need a specialized skill set to handle complex pipelines.”

System architectures and pipelines need to be built with the knowledge that data will grow and processing them, including understanding how they interconnect with other datasets, can become more complex.

“It’s just a matter of complexity in terms of handling the data that makes data engineering kind of different from a software engineer or a [database administrator]. And the skill set that you need to have as a data engineer, I think is a superset of the skill set you need as a DBA or software engineer,” she said. “A lot of the day-to-day would be designing pipelines, a lot of coding, memory management, your schemas, how you design, how you provision all of those things.”

Maureen Teyssier

While some large companies have data scientists who only devise an algorithm or a proof of concept, and then leave the implementation strictly to the engineers, other companies expect data scientists to code their own algorithms and deploy them into production. Still others give their data scientists access to a data warehouse and expect them to produce analytics for executives to review.

“You have maybe a team of data scientists, and the expectation is that they do work and they plug that work into other components of an engineering pipeline,” Teyssier said. “Or they do work on a data lake that surfaces analytics, and they interact with the C-level people. By that I mean they interact with the COOs, they interact with the CEOs and they surface analytics and insight that drive the direction of the company.”

A fourth way some teams are organized involves having a mix of data scientists and data engineers but the lines between what each do are more blurred, with several possessing a mix of data science and data engineering skills.

“There are teams where you have data scientists and engineers and you have people that have started to become a hybrid,” Teyssier said. “They are somewhere in the middle of the spectrum and they are learning from each other and they are building data processing pipelines that have machine learning integrated with the pipeline.”

Transitioning to industry from academia

Working as a data scientist in industry has had its share of surprises for Lim. But while she was afraid it was going to be a drastic departure from her academic life, her experience proved there were parts that weren’t different.

Kyungeun Lim

For instance, she worried that her new job would lack intellectual stimulation but she found that not to be the case. Lim’s met smart people and she also has time to read and research unfamiliar topics.

“There are certain things I have to deliver but I also have time for my R&D,” she said.

Lim said she finally decided to leave academia because she wanted to find new challenges. She chose to join Insight Data Science as a Fellow after hearing about the program from her friends.

“I had many friends who recommended Insight,” she said. “I had a good impression of Insight so I decided to apply.”

Pam Wu

Wu wasn’t so easily swayed to attend Insight initially but changed her mind when she heard an alumni present at an annual Python conference called SciPy.

“I had heard about Insight for several years and I was skeptical at first because I’m skeptical of everything,” she said. “What convinced me was I went to SciPy. I saw this really sick presentation given by one of [Insight’s] alumni.”

Matar Haller, an Insight Data Science alumni who also presented the same topic at Strata+Hadoop World, used her time in the program to build a project to detect different voices during a recorded conversation. The project, which was done in three weeks, convinced Wu that Insight was the right next step.

“They actually do pretty cool stuff so this must be totally legit,” Wu said of Insight. “So when I was on the cusp of graduating, I was just like, ‘Well, this sounds, you know, easier and more pleasant than finding a job on my own. I guess I’ll sign up.’ and I did, and it turned out be basically true and so, yadda-yadda-yadda, I’m working for Enigma.”