Course description in Danish

All lectures will be given at the DTU main campus. If need be it will run online in 2020

Computational Data Analysis

 

Example of a hyperspectral cube (from Wikipedia)
 
Course Outline

New and challenging sources of data such as gene microarrays and hyperspectral images have spawned a tremendous progress in statistical modelling over the last decades. The aim of this course is to give a theoretical and applied introduction to several methods which make it possible to analyze and understand such data. These methods are also applicable to more traditional data sets, where the resulting models may perform better and/or be easier to interpret.

We will keep a focus on hands on application of the methods and limit the theory to a necessary to use the methods properly.

Because of the wide span of contents in this course, the study pace will be relatively high. However, there are plenty of room for exercises and discussion such that students will be able to complete the course despite differences in previous experience.

Exercises will primarily be based on running and altering Matlab, Python or R programs (choose the programming language you prefer). Previous experience with programming is a prerequisite. Please install Matlab, R or Python on your computer prior to course start. DTU offers a free Matlab student license.

 


Register here

Practical information

Date: 24 - 28 August 2020

Duration: 08:00 - 17:00

Location: DTU, Kgs. Lyngby

Language: English

Registration deadline:
17 August 2020

Price:
7.500 DKK ex. VAT /excl. moms

Questions?

Course responsible
Line Katrine Harder Clemmensen
Tel: +45 45 25 37 64
Mail: lkhc@dtu.dk

Detailed Information

Course Form

This course is given is a one-week course in August (24-28 August, 2020) at The Technical University of Denmark, or if need be the course will run online this year. Subsequently the students spend one month applying the methods to own data. The course is a 5 ECTS course. It is open both for all PhD students and for everyone else via Open University/DTU Continuing Education. For information on how to apply via Open University/DTU Continuing Education, see this link.

Microarray example (from Wikipedia)

 
 
 
Course Material

The course material consists of chapters from electronic textbooks and electronic papers. Most lectures will refer to the book "Elements of Statistical Learning" (ESL) by Hastie, Tibshirani and Friedman. This book is freely available from this link. References to other material will be given on CampusNet.

 

 

 

Schedule for the Lectures

Lectures and exercises are in modules of half a day for each subject (8-12 o'clock and 13-17 o'clock), and will take place at DTU, Lyngby Campus. We will make arrangements for lunch from 12-13, but students will need to pay their own lunch. The schedule below is subject to smaller changes - content will be: cross-validation, model selection, bias-variance trade-off, over and under fitting, sparse regression, sparse classification, logistic regression, linear discriminant analysis, clustering, classification and regression trees, multiple hypothesis testing, principal component analysis, sparse principal component analysis, support vector machines, neural networks, self-organizing maps, random forests, boosting, non-negative matrix factorization, independent component analysis, archetypical analysis, and sparse coding.

Module

Date

Subjects

Lecturer

Litterature

1

24/8

Introduction to computational data analysis [OLS, Ridge]

Line

ESL Chapters 1, 2, 3.1, 3.2, 3.4.1, 4.1

2

24/8

Model selection [CV, Bootstrap, Cp, AIC, BIC, ROC]

Line

ESL Chapter 7 and 9.2.5. You may safely skip sections 7.8 and 7.9

3

25/8

Sparse regression [Lasso, elastic net]

Line

ESL Chapters 3.3, 3.4, 18.1, and 18.7

4

25/8

Sparse classifiers [LDA, Logistic regression]

Line

ESL Chapters 4.3, 4.4, 18.2, 18.3, 18.4, 5.1, and 5.2

5

26/8

Nonlinear learners [Support vector machines, CART and KNN]

Line

ESL Chapters 4.5, 4.4, 5.1, 5.2, 9.2 and 13.3

6

26/8

Ensemble methods [Bagging, random forest, boosting]

Line

ESL Chapter 8.7, 9.2, 10.1 and 15

7

27/8

Subspace methods [PCA, SPCA, PLS, CCA, PCR]

Line

ESL Chapters 14.5.1, 14.5.5 and 3.5

8

27/8

Unsupervised decompositions [ICA, NMF, AA, Sparse Coding]

Line

ESL Chapters 14.6 - 14.10,[Sparse Coding, Nature]

9

28/8

Cluster analysis [Hierarchical, K-means, GMM, Gap-Statistic]

Line

ESL Chapter 14.3

10

28/8

Artificial Neural Networks and Self Organizing Maps

Line

11.1-11.5 and 14.5

 

Examination

The student should participate in the course and hand in a small report on one or more of the course subjects related to the students' own research. The grades will be passed/non-passed. Deadline for the report is one month from the last lecture (i.e. end September).


 
Course responsible

Line H. Clemmensen, Associate Professor, DTU Compute, Statistics and Data Analysis, lkhc@dtu.dk