In perhaps one of the most cost effective triumphs of machine learning for medical research to date, a collaboration between Topcoder and Harvard Medical School has produced models showing dramatically superior performance for tumour delineation.

This article looks to help bridge the gap between medical research and general understanding, the end goal being to proliferate understanding of the possibilities for transformational studies in machine learning for medical research.

Summary

This past spring, JAMA Oncology’s groundbreaking study revealed that AI / data science can detect cancer tumors faster, more effectively than humans.

Lung cancer is the 2nd most common cancer and leading cause of cancer mortality in the US.

Specifically, technologists, researchers and doctors from Harvard, Brigham/Women’s, Dana Farber Cancer Institute and Topcoder proved that on-demand crowdsourcing methods can be used to rapidly prototype AI algorithms that can replicate the results of expert radiation oncologists in targeting lung cancer tumors, while reducing associated time costs by up to almost 97%.

These AI algorithms could improve global cancer care, especially in under-resourced areas of the world.

Tumour Delineation

We find this best explained in the following video:

Machine Learning as a Solution

The challenge – producing an automatic tumour delineation algorithm which is as accurate as a medical expert while accounting for differences in medical opinion between experts.

Courtesy of Jessica Ann Morris (Managing Director, jam:pr), we were able to interview Topcoder CEO Mike Morris and Dr. Raymond H. Mak of Harvard Medical School (Radiation Oncology, Cancer Specialist, Professor and Lead Researcher).

ML for Lung Cancer – The Most Successful Algorithms

Question – You documented a significant increase in performance with the submission with the models that were submitted through top coder, what exact algorithms [did the winning submission utilize]?

Mike Morris – The winning submission did not utilize a single approach – instead it used something that we call an ensemble approach. This means taking the best of several different approaches, in this particular case the ensemble approach gave a 12% increase in performance over any individual approach.

The winning approach utilised a combination of –

Convolutional Neural Networks

Cluster Growth

Random Forest

With a weighting (heuristic) assigned to each.

ML Based Research Productivity Through Crowdsourcing Methods

Question – You used crowdsourcing to develop an array of models, was the development of these models completed largely independently of each other? If this is the case, how would you estimate the productivity to differ between this approach and ‘grouping’ the development more – ie collecting together the approaches by type and using a more open source approach?

Mike Morris – So you’re not coding in a shared development environment, people don’t see each other’s code, but they immediately see the results. So, seemingly real time, we give feedback that everybody can take. So, that’s what drives a sort of version of collaboration, right? You know that hey, whatever approach I’m trying isn’t going to work because this person on the scoreboard is getting so much further here and I have to go back to the drawing board. That’s where the iteration comes in.

There’s a message board and forum, people that are in it talk back and forth, it’s very much a place where people are not going to hide their logic or approaches. A lot of people do share just that. They’re there for the discussion and for learning something or challenging other people or explaining something for somebody. A lot of that happens, so although the code is not shared, there are a lot of comments and discussions that get shared during the contest.

Mike Morris – The fact that there were so many competitors (564 contestants from 62 countries) with so many solutions being submitted (588 algorithms).This is something we really see a need for when progress plateaus. The neat thing about is when you have enough of these brains working on it, they don’t accept that. As far as you can go, then they try to drive further and further and further. That’s where you start to see these different creative approaches that you probably wouldn’t have thought of.

Curing Cancer with Crowd Sourced Machine Learning

Question – To what extent do you think this work could be diversified and made applicable to other forms of cancer and how rapid such development could be – whether similar timescales could be expected or whether other factors such as availability of data sets etc could be an impediment / asset. (What would be the long term timescales for this work?)

Dr. Raymond H. Mak – The bigger problem now is getting sufficient training data that is that a high quality enough to provide a useful solution. Getting enough big enough data sets that have consistency that has been all done by one expert or sets of experts agree upon a consensus. Those are missing to be able to train these algorithms at this point.

The variation between expert doctors that do these tasks is quite substantial, the problem is you could train an algorithm on 500,000 scans from somewhere, but you just don’t really know what the quality is and how it benchmarks against an algorithm from another center that was trained on a different data set. Also curation of the data, to be sure that is the high quality data going into the training set that is going to be an issue.



How are you tackling this?

There are a couple of different ways, our initial approach here was to basically say, at this point it’s intractable to really know who’s correct between doctors that are all experts.

There are a couple of different ways, our initial approach here was to basically say, at this point it’s intractable to really know who’s correct between doctors that are all experts. Our approach was to create a training set that was done by one expert only. Essentially, the algorithms that are produced replicate the ability of that one expert (Dr Mak). So I went back and segmented every one of those tumours for us to use in one version of truth or one version of targeting based on one expert. The question now is how do we generate multiple data sets, when typically we do not have more than one expert making an assessment of a patient.

The question now is how do we generate multiple data sets, when typically we do not have more than one expert making an assessment of a patient.

General ML for Medical Research

Question – Broadly speaking, how does this work fit in with other models that have been developed with similar intention (focusing on treatment as opposed to detection), along with future plans for work of this kind.

Dr. Raymond H. Mak – I think most of the press and interest, also in academic circles are their applications for detecting skin lesions or lung nodules (just the presence of them). They’re reporting performance in the 0.8-0.9 range, we fall in that bracket in terms of the performance, though, our performance metric is very different to their metric.

Theirs is the accuracy of the diagnosis, or performance metric tackles the problem that if we don’t align or completely put the radiation into the tumor, the tumor is going to come back when the patient is going to die. So our metric is very different in the sense that what we want is those algorithms that completely outline the tumor, without putting in too much of the normal surrounding tissue and damaging surrounding organs.

ML for Lung Cancer Treatment – The Dataset

Question – The data set used to validate the results of each algorithm stated in the paper consisted of 21 CT scans – given the variable nature tumors this seems like a very small amount? Do any standards exist for the size or properties of the dataset that would be required for training in order to begin introducing this into clinical usage?

Dr. Raymond H. Mak – So, segmentation and targeting is kind of like labeling / annotation. There are other examples out there in imaging repositories that allow a crowdsourced approach to labeling. The concept we’re playing around with – is there a way to crowdsource more segments on top of the existing CT dataset. We figured out a way to safely anonymize and publicly post an app for people to label them, involving experts documenting their labelling in an app such that over time, we could create a set of labelled training data and store the consensus. We’re nowhere near that right now though.

I had the same reaction to the size of the data as you, we’re used to petabytes of data coming in – these curated sans are made up of hundreds of images and were selected for their variability – as opposed to just being of one kind of version and having an algorithm that is really good at doing one thing.

ML for Cancer Treatment in Less Economically Developed Countries

Question – What would the expected administrative and technological barriers to introduction in less economically developed healthcare settings be?

Dr. Raymond H. Mak – Radiation therapy is very high tech and relies on an array of different hardwares and softwares. The main issue for introduction into these different settings will be the mish-mash of these different hardwares and softwares that have been implemented at each of the centres. There is no single unifying software system across the world. It would have to sit on top of multiple different vendors in terms of the radiation therapy software.

The easiest thing would be to partner with a software company and create maybe a 40% solution. There’s definitely a debate within our own group of people as to whether these tools are going to reduce disparities or enhance disparities.

Question – At the same time I would argue that shipping in a computer system 99 times out of 100 will be easier than shipping in a human expert such as yourself?

Dr. Raymond H. Mak – Absolutely. There’s some interesting data I could point you to a paper led by one of my colleagues trying to estimate the workforce gap in radiation oncology in low and middle income countries. It’s staggering, tens of thousands of gaps between the expected number of doctors needed to treat cancer patients in those countries, to where we are now and what is projected. So clearly we cannot solve the workforce problem by just training more doctors in those countries, we need things like this to enable that. It’s just trying to figure out how to enable it, deployment in a way that that supports the doctors there without causing more problems.

Particularly we’d be interested in understanding whether these algorithms can act as training tools that can provide consistency – so they can learn the techniques from the tool itself as opposed to just running it on autopilot.

Improving the Approach to Crowdsourced Research

Question – How would you improve the approach with respect to the crowdsourcing, if at all?

Mike Morris – This approach was a little bit new to us, as we added that third collaborative phase and made it a little bit of a free for all, giving them the ability to work the way they want it to work. They initially went into a few different working groups that targeted different things, but ultimately ended up collaborating more than we expected. I’d say we learned a lot from the project – previously, when we had gone about using team approaches, the beginning of a project, the results weren’t as high. A team approach towards the end of the project is something we hadn’t thought of, which produced much better results. So that’s something that we’ll incorporate more on our projects.