PhD Defence by Anders Roy Christiansen

Title of the PhD thesis: Compressed and Practical Data Structures for Strings

Place: Thuesday, 20, February 2018 at 14:00, The Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kgs. Lyngby, Building 101A, S01

Abstract: The amount of digital data we store worldwide increases at an incredible pace. In order to use all this data for something meaningful, we need methods to handle all this data efficiently. Fortunately, hardware improvements over the past decades has made us able to handle much more data than previously possible. Unfortunately, it is very difficult for hardware improvements to keep up with the current growth in data. Designing clever and efficient ways to handle all this data in software instead of relying on hardware improvements is therefore of outmost importance. This is where the topic of algorithms and data structures comes to the rescue. A key component in handling all this data is compression. All data with regularities can be compressed to use less space than the original data. One recent source of this growth in data is DNA sequencing where researchers try to understand the human genome. DNA sequences are long and thus takes up a lot of space, but the DNA of two persons from the same population is approximately 99% similar, ie. only few differences in the DNA make individuals who they are. This means DNA is highly compressible due to the many repetitions in the DNA sequences, and therefore one among other good applications for the methods we develop in this dissertation.
Compression itself is a well-known technique and many very efficient ways to compress data have already been invented and are used on a daily basis in almost everybody's life. Earlier on - and still to a large extent - when working with compressed data, the strategy was to simply compress it and then when the original data is needed decompress all of the data again - even if only small amounts of the data is needed. In this dissertation, we explore and improve state-of-the-art ways to compress data in a way that still allows us to do operations on the data without decompressing it first. We call such representations of data compressed data structures. One out of multiple topics we explore is called text indexing. This is the problem solved when you use the search function on for instance a webpage - you want it to tell you where the pattern you searched for is located on the page promptly. Similarly, when a researcher wants to test if a DNA sequence contains a certain interesting DNA patterns. We design improved methods to solve this and similar problems directly on compressed data.

Read more about this thesis in DTU Orbit.

Principal supervisor: Associate Professor Philip Bille, DTU Compute
Co-supervisor: Inge Li Gørtz, DTU Compute

Examiners:
Associate Professor Carsten Witt, DTU Compute
Senior Lecturer Simon Puglisi, Academy of Finland, Finland
Professor Roberto Grossi, University of Pisa, Italy

Chairperson at defence: Associate Professor Paul Fischer, DTU Compute

Everyone is welcome.

Tidspunkt

tir 20 feb 18
14:00

Arrangør

DTU Compute

Hvor

DTU, Building 101A, S01