Illustration: Colourbox

Supercomputer answers life science questions

Information technology Biotechnology and biochemistry Genes and genomes Health and diseases Mathematics
DTU researchers are using the biggest life science supercomputer in Denmark to analyse complex volumes of data and put doctors in a position to improve our health.

The biggest supercomputer in Denmark, specially designed for life science, has made it easier to analyse huge volumes of data about our hereditary material—our genomes—and thus to customize treatment of the individual patient. Two new projects are intended to improve organ transplants, as well as the diagnosis and treatment of salmonella, listeria and chlamydia using new tools.

Both are eScience-pilot projects from DeIC (Danish e-Infrastructure Cooperation), which gives young researchers the opportunity to work on the supercomputer. Here, researchers can draw on a huge pool of data from, for example, records of Danes’ genomes, electronic patient records, X-rays and DNA sequencing, which illustrates how hereditary material in organisms is structured.

“Over the past 50 years, we have registered a vast amount of data about Danish citizens through their Civil Registration (CPR) Numbers. This means we have a great deal of information that we can combine with our data on genomes. The supercomputer can easily handle this work, and it can also run up to 1,000 different algorithms,” explains Peter Løngreen, Head of Supercomputing at DTU Bioinformatics.

Seeking out new proteins
As a part of one of the projects, DTU-student María Luisa Matey Hernández is working to define a special region in the sixth chromosome called the HLA (Human Leukocyte Antigen) region. The HLA system helps the body to fight infection and holds the key to successful organ transplantation.

Each individual has his/her own combination of HLA molecules, and if there are differences between the molecules in the patient’s system and that of the donor, the patient may reject the donor organ.

Through the project, María Luisa Matey Hernández hopes to identify new types of proteins—large molecules in the body. To help her in her work, she is using the supercomputer, which contains data from the national GenomeDenmark platform where the hereditary material of 150 Danes is mapped out in detail.

“The objective is to determine how good the existing methods are at establishing Danes’ HLA types, as well as to supplement the existing HLA databases with HLA data from GenomeDenmark. If the project is successful, doctors will in future be able to select precisely those organs that match the recipient patient exactly,” explains Maria Luisa Matey Hernández.

Beating chlamydia
In the other project, Peter Bork, an expert in machine learning, is focusing on developing new tools to assist with the diagnosis and treatment of pathogenic bacteria.

As the first researcher at the Department of Bio and Health Informatics (DTU Bioinformatics), he is using two neural networks on the supercomputer—LSTM (Long Short Memory Networks) and CNN (Convolution Neural Network)—that are based on mathematics and algorithms, and which mimic patterns in large volumes of data.

The neural networks help reveal which proteins bacteria such as salmonella, chlamydia and listeria inject into other cells to infect them. The objective is to determine precisely how the bacteria attack the body, and which substances they emit.

The networks are also used in combination with the machine learning library TensorFlow, which enables researchers to run data across hundreds of computers at the same time.
Peter Bork hopes that the project will bring us closer to understanding the influence biology has on our well-being:

“If the project succeeds, we will be one step closer to beating chlamydia, listeria and other harmful bacteria. For the general public, this will mean improved medical treatment in future, as well as food containing fewer harmful bacteria.”

Video