Computer Science, Biology, and the Traveling Salesman by Team AppNeta on November 26, 2013

Where Computer Science Meets Biology

I’m Frances, a co-op student at AppNeta’s Vancouver office. As a student in a combined biology and computer science degree, I see a lot of overlap between the two areas. With millions of species on the planet and millions or billions of A’s, C’s, T’s and G’s in a genome, there is a lot of data to handle and an increasing need for bioinformaticians. However, most of the overlap I see is where computer science can help solve biology problems – it seems that using biology to solve computer science problems is much less common.



A phylogenetic tree showing the evolutionary relationships between bacterial species, created using bioinformatic methods. (Picture from www.nature.com.)

Recently, I was given a couple days off work to travel to Boston and participate in the International Genetically Engineered Machine (iGEM) competition. In iGEM, teams of undergraduate students (and now high school students) from around the world spend the summer engineering a biological system, then travel to competitions in the fall to present their work. Projects this year ranged from using bacteria to extract gold from electronic waste to filtering water with moss. On the University of British Columbia team, we focused on the bacteria used in yogurt production, attempting to “vaccinate” them against viruses that can ruin yogurt batches, while also engineering them to produce vanilla and cinnamon flavours.

The UBC iGEM team presenting at the world championships in Boston.

Computer science is involved in iGEM in a number of ways. Teams can use bioinformatic methods to plan and carry out their project or create software tools as part of the "Software" track. Our team formulated an award-winning model of how our vaccinated bacteria would behave in an actual bioreactor and performed simulations with this model. There is also an "Information Processing" track, where projects focus on creating systems that allow us to "navigate and use lots of information quickly and effectively". This year's University of Alberta team was in this track and decided to tackle the famous Traveling Salesman Problem (TSP).

The Traveling Bacteria Problem

The TSP is to find the shortest path that visits every city in a set and returns to the first city, given defined distances between each pair of cities. The brute force method, which is to try every path and calculate which has the shortest total distance, has O(n!) time complexity, making it impractical to do sequentially for more than a handful of cities. As the U of A iGEM team pointed out, a 15-city problem would take almost two months to perform, assuming you can map 10000 routes per second.

There is no existing algorithm to find an exact solution to this in polynomial time. However, the brute force approach is highly parallel, so if you just had enough processors, it could be quick – if you had more processors than routes, it would be instantaneous. The problem is finding a computer with millions or billions of processors. According to the U of A team, the solution is bacteria.

Cultures containing millions of bacteria are routinely used in microbiology labs. By representing each path between two cities as a gene, and a combination of these genes as a full route, the team managed to solve a four-city TSP using bacterial “processors”. They started with plasmids, circular pieces of DNA that are independent from a bacteria’s chromosomal DNA. They then performed four sequential reactions on the plasmids. For each reaction, they mixed together three genes, representing the traveling salesman’s three path choices, and made each plasmid randomly take up one of the three genes. After performing all four reactions, each plasmid had four genes inserted in sequence, representing a complete route.

Translating a route between cities into a sequence of genes on a plasmid. (Picture from the U of A iGEM wiki.)

Since each gene they inserted corresponds to an observable phenotype (for example, a colour of the gene product), they could insert the plasmids into bacteria and infer which bacteria had which sequence of genes. In the scenario using colours, a bacterium containing a “blue” gene and a “yellow” gene would appear green. Every possible sequence was represented, including some invalid routes that would have to be discounted in the final calculation.

Bacterial colonies displaying different phenotypes. (Picture from the U of A iGEM wiki.)

To determine the shortest route, they modified the proportions of the genes in each reaction step, adding genes representing short distances in higher quantities. The most common sequence of the four genes would therefore represent the shortest possible route.

This was a simple four-city problem, but it was just a proof of concept – would it be feasible to extend the technique to a 15-city problem? Time (and future iGEM teams) will tell.