I joined the UCSD Cognitive Science PhD program with the aim to investigate multi-agent systems. A few years in I joined a project to investigate the interactions of bottlenose dolphins. The research group had a massive amount of audio and video recordings that was too big to handle without computational techniques. I joined the group to provide the computational support that they needed. During this process, I discovered that working with big data is motivating in its own right and that I wanted to pursue the data scientist path in lieu of academia.

I don’t see this choice as abandoning the traditional academic path (PhDs and Postdocs who leave academia traditionally harbor feelings of regret and worry about being failures). The number of open positions available to PhDs and Postdocs is slim to compared to the number of applicants:

Only 0.45% of STEM graduates in the UK will become professors. [1]

“Across all scientific fields, NSF data suggest that only about 23 percent of Ph.D.s land tenure or tenure-track positions at academic institutions within three to five years of finishing grad school.” [2]

This isn’t the only reason though. While academia has historically tackled interesting and challenging data problems, some of the most important discoveries and insights can now only be tackled by companies that have invested billions of dollars to accumulate the necessary data. I can think of no better place to be than being on the forefront of these endeavors.

These past few years have been filled with insights. I’ve been using data to investigate the lives of a very social mammal species, the bottlenose dolphin. I’m now setting my sights on humans, to find insights that can help us understand our own social world and help guide our individual (and corporate) decisions.

Learning to be a Data Scientist

Early on, I applied to software engineering jobs and even reached the last stage of the interview process at Google (prep material here). After a few interviews with other companies I realized that being a software engineer was not the optimal choice for my background or in line with my interests. I connected with a friend of a friend who was a data science practitioner to learn more about the role and the process. He had a lot of great advice and I recommend reading the article. Since then I have been going through the interview process for several data science positions and have had some successes. Recently, however, I put my job search on hold so that I could attend the Insight Data Science program as a Fellow for the 2015 Fall Silicon Valley session. Adventure awaits!

During this transition phase, I’ve tried to amass a large amount of relevant material for graduate students (particularly my friends in the department) who might want to pursue non-academic routes. While there are a growing number of data science programs across the globe, it is possible to pick up the skills while pursuing other STEM degrees. Most of these items are on my Twitter feed, but here are the highlights:

In addition to the large datasets that you might encounter during your PhD (for example: Neuroscience Data by Ben Cipollini), there are plenty of free large datasets available to those interested:

Data Science Courses at UCSD

The Data Science Student Society at UCSD has put together a great list of courses available at the undergraduate level. Below is a listing of the graduate courses at UCSD that are relevant to Data Science (as of August 2015). I included courses focused on audio and video analysis as they also teach the skills data scientists need to tackle large amounts of noisy data.

As graduate students, you sometimes have the option of taking a free course through UCSD Extension. UCSD Extension offers a Data Mining Certificate from which you can take some of their courses. The Computer Science & Engineering Department and San Diego Super Computer Center now offer a Data Science & Engineering Masters degree.

If you feel I have missed a course, feel free to e-mail me and I’ll add it to the list.

UCSD Courses I took that were relevant to Data Science:

CSE 250A. Artificial Intelligence: Search & Reason

COGS 202. Foundations: Computational Modeling of Cognition

COGS 225. Visual Computing

COGS 260. Seminar on Special Topics (Sometimes on AI)

COGS 200. Cognitive Science Seminar (I took Cognition under Uncertainty)

COGS 220. Information Visualization

ECE 272A. Stochastic Processes in Dynamic Systems (Dynamical Systems Under Uncertainty)

MATH 285. Stochastic Processes

MATH 280A. Probability Theory

PSYC 232. Probabilistic Models of Cognition

Other UCSD courses I have not taken but are relevant to Data Science:

MATH 280BC. Probability Theory

MATH 281ABC. Mathematical Statistics

MATH 282AB. Applied Statistics

MATH 287A. Time Series Analysis

MATH 287B. Multivariate Analysis

MATH 287C. Advanced Time Series Analysis

MATH 289A. Topics in Probability and Statistics

MATH 289B. Further Topics in Probability and Statistics

MATH 289C. Data Analysis and Inference

PSYC 201ABC. Quantitative Methods

PSYC 206. Mathematical Modeling

PSYC 231. Data Analysis in Matlab

CSE 250B. Principles of Artificial Intelligence: Learning Algorithms

CSE 250C. Machine Learning Theory

CSE 252AB. Computer Vision

CSE 253. Neural Networks/Pattern Recognition

CSE 255. Data Mining and Predictive Analytics

CSE 256. Statistical Natural Learning Processing

CSE 258A. Cognitive Modeling

CSE 259. Seminar in Artificial Intelligence

CSE 259C. Topics/Seminar in Machine Learning

COGS 219. Programming for Behavioral Sciences

COGS 230. Topics in Human-Computer Interaction

COGS 243. Statistical Inference and Data Analysis

ECE 250. Random Processes

ECE 251AB. Digital Signal Processing

ECE 252B. Speech Recognition

ECE 253. Fundamentals of Digital Image Processing

ECE 271AB. Statistical Learning

References