Julie Gould

Hello, I’m Julie Gould and this is Working Scientist. In this technology series, we’re exploring how technologies are affecting scientists, research and universities, and we’re continuing along the same lines as our second episode when we looked at the importance of coding skills in a research career. So, last time we heard from Jess Hedge who had to teach herself how to code when she realised that she needed to adapt some software in order to complete her research. During our conversation, she mentioned that although she taught herself most of the skills, she would have liked to attend an intense workshop where she could focus on her training skills. Now, someone who does run these workshops is Harriet Alexander. She is a postdoctoral fellow in oceanography bioinformatics at the University of California, Davis, and she’s also an instructor for Software Carpentry. She had a chat with Jeff Perkel, our technology editor here at Nature, about why a course like Software Carpentry can be really useful.

Harriet Alexander

More and more, scientists are writing code and running complex analyses on larger and larger datasets and traditionally we haven’t had a lot of training in how to do that as we’re going through our education as a graduate student or as an undergraduate. And so, the programme really merges concepts from software engineering and brings them into a kind of basic entry level for scientists, to teach them basic coding in either Python and R, Git and version control, database management with SQL – things along those lines. And the courses are designed usually to be taught in a kind of compact and intensive sort of hands-on way over a period of about two days.

Jeff Perkel

What kind of people take the course – undergraduates, graduate students, PIs?

Harriet Alexander

I would say it varies from course to course. Typically, in the courses that I’ve taught, it’s been a mixture of graduate students, postdocs, technical staff or sometimes PIs – basically, people who are actively doing research. And I feel like the people who are drawn to taking this type of programme are people who are encountering problems in their own research.

Jeff Perkel

What kinds of problems are you talking about? What kinds of problems do people encounter that drive them to want to learn to code?

Harriet Alexander

Increasingly in biology, people are turning to methods which generate large amounts of data. In particular, I’m thinking about sequence data and for myself, I came into a lab that hadn’t done a lot of sequencing, but we wanted to use sequencing to answer some questions we had about how phytoplankton respond when there’s a pulse of nutrients. And so, we did something called RNAseq, where we sequenced the expressed genes and we generated all these massive files which have to be put through a series of contortions by pumping them through various software programmes. And I struggled a lot with figuring out how to manipulate these data – how do I look at these data on the command line? I can’t open it in Excel because it’s a file that’s, I don’t know, 10 GB in size, and that would break Excel, so how can I actually look at it? And I wasn’t terribly familiar with the command line, and I think that this type of problem is something that’s increasingly common for biologists and also in other disciplines potentially.

Jeff Perkel

What will this training do for your career and sort of like what would be the next step for you?

Harriet Alexander

What we try to do as instructors is take away the fear element because a lot of us have sort of an, ‘oh, I’m not a computer scientist’ or, ‘oh, I’m not a statistician, I can’t do this’, and trying to break down that mental blockade and really enable people and make them feel comfortable with asking questions, making them feel comfortable to fail, make them realise that they can do this and they can Google error messages and frankly, that’s how everybody who does do computational science or coding works. You go to Google, you read forums, you find the answers.

Jeff Perkel

Software Carpentry has hundreds of instructors all over the world. One of the things that makes you unique is that you are the only one to offer or to give a workshop in Antarctica. How did that go?

Harriet Alexander

So, I went to Antarctica in January 2018 as a participant in the Advanced Training Programme in Antarctica for early career scientists. The reason that I ended up giving a Software Carpentry workshop at McMurdo Station was because of inclement weather. We were down there for a course and our time was really tightly scheduled so that we could do all the course requirements and get all of our field samples and run our experiments. However, we had a series of bad weather days around the time that we were supposed to fly out, and we ended up getting stuck at McMurdo Station for, I think it was about five days after the point we were supposed to have left. So, we ended up having time to be able to run a Software Carpentry workshop during the last couple days that we were waiting for our plane to take us home.

Jeff Perkel

How did that course go? How did it differ from the ones you’ve given, you know, not in Antarctica?

Harriet Alexander

Well, it was very different. The primary difference that I discovered was that access to the internet is significantly slower and more limited down in Antarctica than it is at a university where I would have taught back up in the States. So, the primary problem that I ran into was being able to access the course materials and trying to download the software programmes that we needed for people to be able to run these materials on their own machines, which is a primary goal of Software Carpentry. The idea is that you come with your own laptop and you’re able to do everything that the instructor is teaching you on your own laptop. However, most people don’t have, for example, for Python we like to install Anaconda, and most people who want to take Software Carpentry won’t necessarily have Anaconda installed on their computer. And so, I spent a few days trying in vain to download Anaconda – just one copy of it onto a USB stick – and it is, I think it’s 300 MB in size, give or take, and I was unable to download it at all, so that definitely made me have to be more flexible for the teaching of the course.

Jeff Perkel

What other technical challenges do they have there? It’s so remote – are there tools that you need that you can’t get or that you kind have to just sort of jury-rig from what’s available?

Harriet Alexander

Deploying to do science in Antarctica requires that you plan very meticulously because as you’re alluding to, once you’re down there, you’re down there and there is no Target, Walmart, Amazon – you’re not going to be able to get whatever you need. So, for scientists that means that typically we ship in all the equipment and materials that we know that we’re going to need. However, McMurdo is kind of a wonderful place. It’s been around for a long time and they have a lot of supplies there. So, when I was down there, I ended up deciding that I really needed a syringe to be able to inject these plastic bags with a chemical that we wanted to use for an experiment, and it just so happened that they had this stock room full of old, random odds and ends, ranging from different types of glassware and leftover little scientific equipment and little disposable items, and I was able to find what I needed in that room. But the name of the game when it comes to deploying to remote places to do field research is really preparation. Like a boy scout, you need to make sure that you have what you need to do your science.

Jeff PerkelSo, if you didn’t think to bring it yourself, it’s got to be in the room of requirements or you’re out of luck.

Harriet Alexander

Exactly, and I think most of the time, people make sure to have everything and also have redundancies because when you get down there and it’s cold and blowing and whatever you’re doing, things sometimes break so having a backup is always a good idea.

Julie Gould

Now, admittedly, not everybody enjoys working on software development and coding. I totally get that – I’m one of them. But for those that do, there’s actually a career path that might be of interest. Research software engineers, although not a new role, is a newly recognised role and their need is growing rapidly, says Simon Hettrick who is the founding chair of the UK branch of the Research Software Engineers Association. I asked him what a research software engineer actually was.

Simon Hettrick

It’s a person who combines deep understanding of research – about 67% of them have PhDs – and that understands the way that researchers work, which is a very important thing to do, and also understands software engineering. And obviously, these people are now vital because so much research relies on software, and combining that reliance with the fact that not many researchers have training, then it’s clear that what you need is a new role in research – somebody who can do that translation between the research and the software engineering.

Julie Gould

There is an argument that scientists should learn to code and to develop their own software because they’re the ones that are the most familiar with their data. If a scientist can learn to do it themselves, what is the need for a research software engineer?

Simon Hettrick

I would say all researchers should learn to code because it’s a very useful skill to have, not just in your research but in your day-to-day life, and it also means that when you need to work with people like research software engineers, you can speak the same language. Some researchers are facing problems that are quite straightforward. Organising some straightforward cleaning of analysis of a small dataset – that’s well within their remit and they can teach themselves how to do it. But the thing is, researchers are being asked to do more and more as research modernises and improves, so they have to not just do their own research and have all their own domains understanding. They have to manage staff, they have to look after their HR and their finances, they have to comply with data regulations and a whole suite of different things. Adding software engineering into that is really quite taxing regardless of how intelligent and hard-working you are. And as research software engineers are based within universities and understand researchers, then it’s a very easy relationship that you can form between those two groups.

Julie Gould

So, what are the benefits of working with a research software engineer?

Simon Hettrick

One of the real benefits of working with a research software engineer is because they have this specialist knowledge in software engineering, they can listen to the researcher’s problem and they can make so many different new suggestions about the way things can be done that will really, really benefit the researcher.

Julie Gould

Let’s use an example: we’ve got a virologist or someone who works in genetics and who has to do a lot of genetic sequencing and they come out with a huge dataset that they need to manage and maybe merge with another dataset and then compile and then analyse. So, how would a research software engineer be able to help in this situation?

Simon Hettrick

Generally, it depends on what you’re trying to do with the data in the end, what sort of analysis you’re trying to conduct. So, a research software engineer would be brought in very early on, you would talk through the problem with them and then they would make the right kind of suggestions. So, they would push you towards using software that already existed or at least doing a search to find other people who had done this kind of analysis to find out what tools they’d use rather than just starting from scratch and writing your own software. They would certainly suggest that you follow good open research techniques so that every single step of your research can be repeated and reproduced, which is something that is really important to the open research people. When it came to particular tweaks or specific changes you wanted or specific outcomes you wanted from your research, if software didn’t exist they would work with you to develop something that did that analysis but also was acceptable to use and you could use it yourself. They would help you with the training on that software and they would document it and they would probably suggest it would be a good idea if you were to share this new software they’d created with other people in your field.

Julie Gould

How are research software engineers employed? Are they like a postdoc where they’re on grants and supported by a PI or are they employed by the university or how does the employment structure work?

Simon Hettrick

What used to happen was that researchers realised they relied on software quite considerably. They knew that they didn’t have much training on the skills within the group so they would look to employ a software developer. But there’s no career path for a software developer in most universities. So, at that point, if you give a professor a problem, they tend to usually find a postdoc-shaped solution. So, what would happen is they would recruit people into a postdoc position. That person would spend more and more of their life writing software and becoming absolutely vital to the group’s work, but at the same time not generating papers, not bringing in funding, and those are the two things that most postdoctoral positions are judged on. So, this person would come in, do incredible work, become vital to the group, but have absolutely zero chance of progressing their career and eventually, the really, really dedicated ones would stay but many would just leave. What happened with the RSE – the Research Software Engineering campaign – was that we tried to change this so that there would be a set career path within academia for research software engineers. So, the rise of research software engineering groups – this is the model that first started at UCL – and they brought together their research software engineers, they pooled them into a group and then they hired their time out to researchers across the university, and that meant researchers got a specialist and they only paid for that person when they needed them. They didn’t have to worry about recruitment and what happens when this project finishes, how am I going to keep hold of them, and it also meant that research software engineers got – by pooling demand – much more predictable careers. They knew there would be more work coming in and that they wouldn’t have to jump from one short, fixed-term contract to another.

Julie Gould

It’s a bit like a core laboratory but based on the people and their skills rather than instrumentation.

Simon Hettrick

Yes, many times people have said so, I’m basically a telescope nowadays.

Julie Gould

So, how does one become a research software engineer?

Simon Hettrick

Through a variety of different means, but to take the sort of typical example, you start off as somebody who has an interest in research, you take up a PhD in a field that… it doesn’t have to be computational but that the topic that you’re working on requires software to be solved, and then you spend your time writing software and you start to love it, and then when you get to the end of your PhD, you make that decision about whether you want to be a researcher who does software or a research software engineer who does a bit of research. Generally, people who chose the latter option have decided that they like the software and they want to apply it to different domains so it’s one of the things that gets them out of this idea that they want to work with a range of different projects, doing the software engineering and engineering rather than just working within a single research domain.

Julie Gould

So, what advice would you have for anybody who is interesting in becoming a research software engineer?

Simon Hettrick

To get in touch with the RSE group if there is one local to your organisation. And if not, then get in touch with the national association. They will get you in touch with somebody. But the other great thing to do would be to come to the RSE conference which takes place every year. It’s a really good event to go to because this group is a group that’s recently been starting to be recognised and so they’re very, very enthusiastic and keen. It’s a really good conference to attend.

Julie Gould

Thanks to Harriet Alexander from UC Davis and from Simon Hettrick from the Software Sustainability Institute and the Research Software Engineers Association. Now, a skill that comes hand in hand with compiling large datasets is managing them well, and in the next episode we will look at why it’s so important to manage your data well and why this is a key element in doing reliably reproducible research.

So more and more in data science and computer science, having papers with accompanying data and accompanying code is becoming crucial, and there’s certain venues where that’s mandatory, but more and more venues are encouraging that. So, where you can say well here’s my paper and the findings that I have in this paper are based in this dataset and here’s the code for the analysis that I ran on that dataset.

Julie Gould

Thanks for listening. I’m Julie Gould.