Denmark’s biggest computer specially designed for life science

Computer calculations Data analysis Hardware and components Genes and genomes Bacteria and microorganisms

It took thirteen years—from 1990 to 2003—and cost USD 3 billion to map the human genome. Today, using a small sample of human DNA—from a person living or dead—we can in just few days describe exactly how the four bases—A, T, C, and G— combine to make up an individual’s genes.

Such whole genome sequencing can be performed for all living things—from plants and microorganisms to animals and humans—currently at a cost of ‘as little as’ USD 2,000 to 3,000.

The technology has thus paved the way for new knowledge about our origin, the occurrence and spread of diseases, the link between specific genes and diseases, and a great deal more. The only prerequisite is that we learn to handle all the many data and create a meaningful connection between them.

Here, an ordinary computer would be quite useless. However, if you program 560 computers—each with two CPUs and 14 cores—to work on the assignment, the picture is suddenly quite different. And that is exactly what has been done in Denmark’s biggest computer to date, which was inaugurated at DTU Risø Campus in December 2014.

The ‘Computerome’ supercomputer is the brainchild of DTU’s centre for Biological Sequence Analysis which has 20 years’ experience with production, analysis, and handling of sensitive life science data. But Computerome is also a part of Denmark’s national e-infrastructure with free access for all of the country’s researchers in the field of life sciences.

 

Associate Professor Bent Petersen and Associate Professor Simon Rasmussen, DTU Systems Biology, ‘inside’ the supercomputer Computerome—so to speak.

 

  Photo: Iben Julie Schmidt

Unique demands
Life science places unique demands on the system, says Peter Løngreen, DTU Systems Biology—Head of-High-Performance Computing & IT  and the man who has headed the work of designing Computerome.

“Life science involves people from many different specialist fields—e.g. biologists and doctors—many of whom lack a specific computer science background. Their calculation algorithms are neither uniform nor static, and the data are similarly very diverse. They may comprise genomes, proteins, high-resolution X-rays, and even text in the form of patient records. The system must handle all such data while also running up to 1,000 different algorithms. That’s why we have launched a parallel file system and a hybrid cloud that can handle sensitive data—something quite unique in the field of supercomputers.”

Attracting data
Computerome immediately entered the list of the world’s 500 largest computer systems. It has 16,048 cores, three petabyte or 3,000,000 gigabyte of memory. It has a storage capacity of four petabyte, and its for HPC is 410.8 TeraFLOPS (floating point operations per second).

Performance still falls short of the world’s currently fastest computer, ‘Tianhe-2’ in Guangzhou, China, which can handle 33,862.7 TeraFLOPS—i.e. 82.4 times more. But Computerome has plenty of advantages to attract users from many parts of the world precisely because it has been especially designed for life science.

“In 2015, we have so far generated more than one-and-a-half petabyte data. This is, of course, a challenge, which among other things means that we now have to consider expanding storage capacity. But it also means that we have very quickly become an interesting partner for researchers from Denmark and abroad. They can see an advantage in placing their data here—not least because we are recognized for our secure management of the often sensitive data and for our data analyses,” says Peter Løngreen.

Computerome gathers sensitive data, which must only be accessible to a defined group of people—as well as more public data series that must be available to all. It has been designed in such a way that all of the data are in a cloud—a kind of virtual computer which corresponds to the individual data owner having his own private cluster. In addition to offering the data owners complete protection, the system enables them to integrate all or part of their sensitive data with the public sector and in that way acquire new knowledge on, for example, specific diseases and their correlation with human genes.

Ultrafast data processing
Associate Professor Simon Rasmussen from DTU Systems Biology has witnessed first-hand Computerome’s enormous potential in connection with a project with the University of Copenhagen and several foreign universities. The project, which focuses on understanding how Homo Sapiens actually emigrated from Africa, is based on mapping the genomes of many individuals, and here Simon Rasmussen is the expert. However, he had to take a couple of months off in connection with paternity leave. The task was therefore outsourced to one of the world’s three largest genome centres.

After two months they still had not finished, and as publishing the findings suddenly became a matter of urgency, the project manager turned once again to Simon Rasmussen, who took little over a week to do the analyses.

“It was possible because Computerome has the enormous advantage—even over many larger supercomputers—that all the cores gain access to the stored data at the same time. The supercomputer to which we had contracted out the task might have 20,000 cores, but hard drives cannot deliver data to all of them at once, so many of them stand idle for much of the time. We have 16,000 rapid cores which we can use simultaneously, not to mention several years’ experience from similar assignments on our ‘old’ supercomputer,” explains Simon Rasmussen.

Global disease surveillance
A global platform—a kind of ‘Google Microorganisms’—that will make it possible to quickly identify microorganisms and assess whether they have the potential to cause serious epidemics. This is the vision for a major research project which is also based on the Computerome’s unique properties.

Because the price of full genome sequencing of microorganisms has fallen drastically, it has become possible—even for small laboratories in developing countries where diseases arise—to perform this task. However, interpreting the many data generated by the sequencing process and assessing the risk of the spread of the disease are two very different things.

The aim of the ‘Compare’ research project headed by DTU, therefore, is to enable small laboratories to transfer their sequencing data to bioinformaticians over the internet for further analysis and assessment in order to determine within the space of a few hours the nature of the disease and how it can be controlled.

“One of the prerequisites for developing these new tools for future disease surveillance is that we have for HPC in the supercomputer class. Without access to Computerome, we would not be able to compare gene sequences from new-found microorganisms with those previously found,” says one of the researchers behind Compare, Professor Ole Lund from DTU Systems Biology.

Article from DYNAMO no. 42, DTU’s quarterly magazine in Danish.

Read more about Computerome in the DYNAMO article “Supercomputers given optimal conditions”.