Multivariate prediction modelling with applications in precision medicine

Doctoral course within the doctoral programme in Epidemiology
Course number: 2990
Credits: 1,5

Course dates: 11-15th December, 2017

Application period: 13th April - 15th May



This course aims to provide an introduction to both supervised and unsupervised methodologies for prediction modelling with a focus on biomedical applications, molecular epidemiology and personalised medicine.

Learning outcome

After successfully completing this course you as a student are expected to be able to:
- Perform and assess basic quality control and outlier detection
- Apply unsupervised and supervised statistical learning methods to detect patterns in data
- Devise cross-validation strategies for parameter estimation, model selection and prediction performance evaluation
- Make informed judgement of how to apply basic principles for variable selection
- Critically evaluate prediction models in real-world applications

Course content

Personalised medicine is a cornerstone of tomorrows health care, and is based on the idea of stratifying patients into groups based on e.g. disease risk, prognosis or probability of treatment response and administrate the most suitable therapy for each individual. The capability to generate vast amounts of quantitative molecular data from DNA- and RNA-sequencing and other molecular profiling methods is providing unprecedented opportunity for implementation of personalized precision medicine approaches in the health care system. Molecular profiling typically generates data with tens of thousands of variables of which only a subset is relevant for treatment decisions. The promise of personalised medicine relies on our ability to turn the vast molecular datasets into clinically actionable predictive models of individualised therapy response. Application of statistical learning methods and prediction modelling is a central component in developing these models, and in developing the biomarker panels that can be used for molecular subtyping, risk stratification and prediction of treatment response. This course provides an introduction to statistical learning methods and prediction models that are relevant for personalised medicine with a focus on real-world applications.

This course aims to provide an introduction to methodologies for prediction modelling with a focus on biomedical applications, molecular epidemiology and personalised medicine. The course covers basic theory and introduction to modern statistical and machine learning methods for prediction modelling in high-dimensional data, together with applied data analysis through computer-based exercises. Lectures and exercises will cover the full process going from the initial data set and through data normalisation, quality control, outlier detection, application of unsupervised learning methods, application of supervised learning methods, variable selection, cross-validation and model evaluation. The main objective of the course is to provide basic theory and practical knowledge that will enable course participants to apply covered methodologies in their own research.

Topics covered include: data import and basic visualisation, data pre-processing, quality control and outlier detection, unsupervised learning, supervised learning, cross-validation for parameter estimation and estimation of prediction performance, variable selection, recently developed methods (e.g. deep learning, conformal prediction).

Suggested course literature:

Elements of Statistical Learning, Hastie, Tibshirani and Friedman (2009). Springer-Verlag,
Freely available at

Course directors

Assistant professor

Mattias Rantalainen

Phone: +46-(0)8-524 824 65

Assistant professor

Martin Eklund

Phone: +46-(0)8-524 823 72

Contact person

Educational administrator

Gunilla Nilsson Roos

Phone: +46-(0)8-524 822 93