I’ve just completed my first round of recruitment since joining Royal Mail as their first Head of Data Science, with some successful candidates joining my team. But having been involved in hiring Data Scientists for many years now, at the initial shortlisting stage of the process, I still find myself wishing too often that the information I’m looking for was in the CV in front of me. It just seems like Data Scientists in general don’t know what they should put in their CVs as they don’t know what hiring managers are looking for. And this leaves hiring managers like me with the dilemma of either rejecting most of the CVs (and taking the large risk of dismissing some potentially very good candidates) or having to employ an additional telephone screening stage to find out the information I need from the potentially good candidates.

Neither scenario is ideal and both have their pros and cons. So I thought I would try something different, and instead write down and publish what I would like to see in an ideal CV from someone applying for a Data Scientist position on my team at Royal Mail. My hope is that this will start an interesting discussion, from which I can also learn more and further improve my recruitment process. Ultimately I also hope this results in improving the quality of CVs across the Data Science community, and thereby help me to streamline my recruitment process as a consequence.

Educational Background

The first thing I look for in a Data Scientist’s CV is evidence of a solid educational background in a heavily mathematical subject. Almost anyone can claim to be a Data Scientist these days, just because they know how to use the multitude of machine learning libraries out there to build you a solution to your problem. But to me, a real Data Scientist is someone who understands the technical details behind the algorithms, and knows what assumptions they are making when using one algorithm vs. another. This gives me confidence that they would select the right algorithm for each specific problem and will be able to engineer the most appropriate features for that algorithm. Therefore, I would like to see an externally validated qualification, such as a University degree or equivalent, and I’d like to see this on the first page (at least mentioned in the personal statement at the start).

Independent Research Experience

Data Science is by definition a research activity, where we are always looking to solve a problem where the solution is not obvious and success is not guaranteed. Otherwise we are not doing real Data Science! This is why I’ve taken Eric Ries' very appropriate Lean Startup framework for innovating in the midst of lots of uncertainty, and adapted it to make it work for Data Science. And thus, I’m running my Data Science team at Royal Mail as a lean Start-up. Therefore, next I’m looking for some evidence in the CV to convince me that the candidate is capable of carrying out independent research. The most obvious evidence for this would be that the candidate has successfully completed a PhD, or at least an MSc that included a research project.

However, the biggest mistake candidates make in this area of the CV is to just say they have done an MSc or PhD in some specific subject (e.g. MSc in Computer Science or PhD in Statistics), and possibly mention the university. But what I really want to know is the details of their research activity and how successful it was. Therefore, I’m more interested in the title and summary of their Thesis. If the work was sufficiently novel and completed successfully, it would give me confidence in their ability to carry out independent research. But of course, attaching their Thesis to their CV is not the answer! In fact, their ability to summarise the key aspects (context, approach, outcomes and novelty) of their research activity in one paragraph is a very important indicator of their excellent written communication skills as well.

I would also consider evidence of alternative equivalent research experience (e.g. experience as a research scientist or Data Scientist). But in this case, it should ideally be called out, for example in the personal statement, and the examples of research projects described in the relevant section of the CV (e.g. in the work experience section).

Programming Skills

Next I’m looking for evidence of the candidate’s programming skills. Here, some candidates love to list a 101 languages, thinking that it makes them look really attractive. But in reality they would only use 2 or 3 languages on a regular basis. Here the key for a hiring manager like me is to see that the candidate has experience in at least one language of each of the following types:

A high-level rapid prototyping language such as Python or R A low-level deployment language such as Java, C++, C#, etc. A scalable/Big Data language such as Scala/Spark

I would want to see all three for a Senior Data Scientist, the first two for a Data Scientist and just the first (R or Python) for a Junior Data Scientist.

The other mistake I see in CVs is just having a list of programming languages with no indication of proficiency or experience. The really good CVs not only list the languages along with the number of years’ experience in brackets (e.g. Java [6+ years]), but also list the languages used in each Data Science project they mention in the work experience section.

The candidate can get lots of brownie-points by also mentioning any open-source code bases they have contributed to, or providing links to their publicly available work (e.g. on github), so that I can actually go online and view their coding ability. This will give me significantly more confidence in their programming skills.

Impact! Impact!! Impact!!!

For the more Senior Data Scientist roles, next I’m looking for the candidate’s real-world Data Science experience, and what really drives me up the wall here is when there is absolutely no mention at all in the CV of any impact they have had in the real world. This leaves the hiring manager wondering why any company would every consider hiring the candidate, as there doesn’t seem to be any indication of ROI (Return on Investment)!

Some candidates focus on who they reported to, others focus on the accuracy and/or complexity of the models they built, while others just mention the types of projects they worked on. But rarely anyone covers the most important thing I’m looking for – what was the impact of their work? Why would I even consider paying them to come and work for me??

It really doesn’t matter to me if the candidate reported to the CEO or if their models were 99.9% accurate. What I really want to know is what difference they made to the business that hired them. Here it is very important to remember that all models are wrong, but some are useful! So it doesn’t even matter if your model was only 60% accurate, if it improved some aspect of the business (e.g. reduced customer churn) and resulted in tangible business value (e.g. leading to annual incremental revenue of £5 million).

My ethos, which is essential in a commercial environment, is to always start with the simplest possible model and only optimise and/or add complexity if/as required. This is exactly what the Lean Startup framework mandates and is exactly what we do in my Data Science team at Royal Mail. This is because you would usually hit diminishing returns as you continue to optimise and/or add complexity to a model, and the key is to know when your model is good enough to have a tangible business impact, and then deliver it, realise the value and move on to the next most important problem.

So ideally, in the work experience section of the CV, I would like to see multiple impact statements, at least one for each Data Science role the candidate has held. This would give me confidence that the candidate has good commercial acumen and is worth investing in, as I can expect a good ROI. Here again, the candidate’s ability to summarise the key aspects (context, approach, impact) of their Data Science project in one paragraph is a very important indicator of their excellent written communication skills as well.

Coaching, Mentoring and Line Management Experience

For the more Senior Data Scientist roles, I’m then searching the CV for the candidate’s experience in coaching, mentoring and line management. If the candidate is already operating at a Senior level, I would expect to see this mentioned in the work experience section, giving details of how many Data Scientists they have managed and for how long, and/or how many they coached or mentored and in what skills. Here a mention of any formal management, coaching and/or mentoring training courses attended would be a bonus.

What would really impress me would be if the CV gave an example of how the candidate managed/coached/mentored a Data Scientist who was either a high-performer or someone with a development need – giving the context, their approach and the outcome. Again, they should showcase their excellent written communication skills by summarising this in one paragraph, instead of writing a thesis!

For a Data Scientist ready to take on their next role as a Senior Data Scientist, I would expect to see some training courses attended and some experience in supervising and mentoring/coaching at least one student and/or contractor/temp. This will give me confidence that they are ready to take on managing and coaching/mentoring permanent staff as well.

Technical Breadth and Depth

Next I would love to get a feel for the breadth and depth of the candidate’s technical capabilities, especially from the CV of a Senior Data Scientist. Here the breadth can be demonstrated by mentioning a variety of types of problems they’ve worked on (e.g. forecasting, predicting, optimising, simulating, etc.). However, I rarely see evidence of the technical depth in a CV. One good example I’ve seen in CVs is where the candidate mentioned an algorithm/library they had contributed to an open-source package. Coding up such an algorithm/library would not only require good coding skills, but also a very deep level of understanding of how the algorithm/algorithms in the library work. Another good example is where the candidate explains why they chose to use one algorithm over another for a specific project. Here if they articulate the choice based on the assumptions behind each algorithm and properties of the data and/or problem, it shows that they didn’t just use a standard library, but understand why the chosen one is the best algorithm to use for that specific problem.

Highlighting their external accreditations, such as their Chartered status (e.g. Chartered Mathematician, Chartered Scientist, Chartered Statistician, etc.) is also an excellent way for a candidate to demonstrate their technical depth and breadth.

Tools and Processes

Especially for a Data Scientist or Senior Data Scientist role, I would also be looking for some evidence in the CV of the candidate’s Agile experience. Here I’m looking for them to call out when and where they worked according to an Agile framework, and ideally which framework (e.g. Scrum, Kanban, etc.) and tools (e.g. JIRA, Assembla, etc.) they used. From experience I have found that combining the Hypothesis Driven Approach with the Kanban Agile framework supports the Lean Startup framework really well. Therefore, this is what my Data Science team at Royal Mail use, and we are currently in the process of migrating from Assembla to the more fit-for-purpose JIRA cardwall. Therefore, although I’m looking for any Agile experience, it would be an additional bonus to see that a candidate has used Kanban and/or the Hypothesis Driven Approach in at least one Data Science project.

I will also be looking for their experience using different environments (e.g. Linux, Windows, Hadoop, Cloud, etc.) and of using best practice processes such as version control (e.g. Git) and documentation (e.g. Wiki). Here the best CVs list these aspects of each of the projects mentioned in the work experience section, while also listing these with the corresponding years’ of experience in brackets in a separate section.

An Open Mind-set

An open mind-set and a commitment to continuous learning are vital, especially for a Data Scientist, as our field is continuously changing at pace. Therefore, I am always encouraging my Data Science team at Royal Mail to spend some time learning something new each week. This does, of course, reduce the team's capacity available for project work. But I consider this as an investment rather than a cost, because it actually is, and I’ve seen the benefits of continuous learning realised time and time again. Therefore, another key aspect I look for in a CV is an open mind-set, and a commitment to continuous learning and continuous professional development. The usual evidence for this would be regular MOOCs or other training courses completed, conferences and workshops attended, etc. And the key is to mention the dates of these learning events, to show that they are a regular commitment.

Softer Skills

Finally, and especially for the Senior Data Scientist roles, I’m looking for evidence of the softer skills such as stakeholder management, influencing senior managers, presentations to business/non-technical audiences. Here, again official training courses and mentoring/coaching received is good evidence to share. The best CVs will also highlight any difficult stakeholders and/or collaborators that the candidate had to deal with and manage in order to deliver the projects they mention in the work experience section. Here the key is to also mention how they dealt with the difficult stakeholder (i.e. the approach they took).

Summary

In summary, an ideal Data Scientist’s CV, will contain all of the above information presented in a clear, concise manner, demonstrating the candidate’s excellent written communication skills as well. And candidates should not be afraid of having a CV that is longer than a page or two. I would be happy to read even up to 5 pages of useful and relevant details, especially if it allows me to move faster in the interview process and get the candidate in straight away for a face-to-face interview and offer them a job quickly.

But you should never not apply for a job if you think you don’t fit all of the criteria. Especially for permanent roles, hiring managers like me would welcome candidates who have one or two development areas to work on. I want people who can join my team and contribute, while at the same time continuing to learn and grow themselves. Continuous development is very important to me, as mentioned before, and is something I am always encouraging my Data Science team at Royal Mail to do, and is something I will always ensure they have the time and space to do.

As I mentioned in my previous article, a new era of Data Science has dawned at Royal Mail! So if you happen to be a Data Scientist who can pull together a CV that contains all or most of the above information, then I’d love to hear from you, as I’ve still got a few Senior Data Scientist vacancies I am looking to fill. My brilliant internal recruiters Terry & Lucy are also always available to help anyone interested in applying to join my Data Science team Royal Mail.

PS: If you’ve just applied for one of my vacancies and would like to update your CV based on what I’ve written above – please do! I’d much prefer to have your updated CV.