Doctoral courses in biostatistics
The purpose of the courses in biostatistics is to expose doctoral students to the three main pillars of biostatistics research: foundational statistical and mathematical theory, development and evaluation of biostatistical methods, and proper application and interpretation of biostatistical methods to answer substantive scientific questions. Theory helps us understand the properties of statistical methods and how they can and should be used.
Target audience
This page describes courses aimed at doctoral students majoring in biostatistics (i.e., students developing new statistical methods as part of their research). An overview of courses in biostatistics and statistical software aimed at students majoring in topics other than biostatistics can be found at https://staff.ki.se/general-science-and-compulsory-doctoral-courses.
During 2022, all of the core courses and elective A courses will be offered. However, some courses may be offered as short workshops or seminars. If you, or your student or postdoc are interested in any of these courses, please contact Erin Gabriel (erin.gabriel@ki.se).
Upcoming course offerings
Fall 2021: Advanced Causal Inference; Philosophy and Practice of Biostatistics*; Missing Data*
Spring 2022: Theory of Survival Analysis Using Counting Processes; Computationally Intensive Statistical Methods*; Missing Data*
Fall 2022: Statistical Inference and Large Sample Theory; Philosophy and Practice of Biostatistics; Advanced Causal Inference; Clinical Trials; Epidemiological Theory from a Statistical Perspective*
* = Pilot course or workshop
The core courses cover the foundations of biostatistical theory and applications. Students who take these courses will be prepared to develop and evaluate their own biostatistical methods to answer important health science research questions. Elective courses cover study designs and methodological developments in specific areas of research and applications. The typical biostatistics PhD student will take all of the core courses, and one elective course from series A, and 2 courses from series B.
Core Courses in Biostatistics
Theory I - Statistical Inference and Large Sample Theory (7.5 credits)
This course is offered via the national network for PhD level courses in statistics. This course covers fundamental theory of statistical inference, including estimation, asymptotic theory for finite dimensions, testing, relative efficiency, the bootstrap, and Bayesian inference. Knowledge of calculus and real analysis are prerequisites. The course consists of lectures and homework assignments.
Theory II - Theory of Survival Analysis Using Counting Processes (7.5 credits)
The analysis of survival data is critical in medical research, whether one is studying the lifetimes of cells, tumors, or humans. This course aims to develop an intuitive understanding of the theory of survival analysis methods using counting processes and martingales. This will provide participants with a deeper understanding of survival data analysis methods in medical research, enabling them to better interpret and analyze them. The topics include large sample results for the empirical distribution function, Kaplan-Meier estimator, logrank tests, and regression models in survival analysis, all using the counting process framework and martingale theory. Theory I is a prerequisite. The course consists of lectures and homework assignments.
Advanced Causal Inference (3 credits)
The purpose of this course is to give doctoral students with a previous degree in mathematics, statistics or a related area an introduction to the rigorous foundation of modern causal inference. Emphasis will be put on mathematical concepts, derivations and proofs.
Computationally Intensive Statistical Methods (7.5 credits)
The goal of this course is to familiarise students with the computational under-pinning of classical statistical theory and specific implementations of modern computational methods. The topics include computational methods in linear algebra and calculus, implementations of maximum likelihood estimation, pseudo-random number generation, simulation from distributions, and parallel processing. The course includes a mix of lectures, programming assignments, and student run seminars.
The Practice of Biostatistics (a.k.a., Statistical Consulting); (3.0 credits)
Students with undergraduate/masters degrees in statistics or mathematical statistics typically receive limited or no training in the application of statistics in real-world collaborations with subject-matter experts. Most courses focus on mathematical theory: research problems are typically neatly translated into theoretical questions, data are usually simplified or simulated, and results are interpreted and communicated from a statistical perspective. Furthermore, statistical modelling and analysis are presented as abstract activities, with little or no regard for the practical, legal or ethical context in which they take place.
The purpose of this course is to fill this gap in the education of biostatisticians by introducing them to challenges they will face when working as a collaborative or consulting biostatistician in biomedical research: these include efficient communication with collaborators in order to understand and formulate research questions, thorough problem formulation prior to statistical analysis, organization and management of data flows, interpretation and communication of results to non-statisticians, but also legal frameworks and ethical dilemmas with which they will be confronted. The course will provide opportunities for both high-level discussion and reflection as well as practical skill training, exposing students to a set of conceptual and technical tools for meeting these challenges.
Many courses on "statistical consulting" in biostatistics programs include a component of "real-life consulting" where students work on real problems in medical science with a medical researcher. We do not currently plan to include such activities, but will consider including them if there is sufficient demand.
We plan to offer this course for the first time in the fall of 2022. The primary teachers will be Paul Dickman, Alexander Ploner, and Sandra Eloranta.
Proposed syllabus (pdf file).
Elective Courses series A
Clinical Trials (3 credits)
This is a multi-disciplinary course focused on the teamwork that is needed in real protocol development, with emphasis on the design of randomized, controlled clinical trials. Topics to be covered include: ethics, selection of comparison group, eliminating bias, reducing variability, selecting eligibility criteria, choice of endpoints, determining sample size, compliance issues, monitoring of trials, interpreting results, protocol definition. There will be one major course project. Students will be grouped into research teams which will consist of up to 5 students and will be heterogeneous with respect to area of specialization (e.g., clinical, epidemiologic, biostatistical). Each research team will respond to a Request for Proposal (RFP) by writing a study protocol that describes and justifies the rationale for a study, its design, a monitoring plan, and plan for analysis of the data.
Epidemiological Theory from a Statistical Perspective (3 credits)
This course covers the mathematical and statistical development of epidemiological study designs. Topics include cross-sectional sampling, cohort designs, case-control studies, case-cohort studies, and how study design relates to causal inference. The topics will be covered in depth and with a high degree of statistical rigor. The format will be intensive modules covering each topic with lectures and assignments.
Elective courses series B
Missing Data (1.5 credits)
The aim of this course is to introduce methods for dealing with missing data in analysis. This course is targeted at PhD students in Biostatistics in their second year, but advanced and methods oriented Epidemiologists and other researchers are welcome provided they have the entry requirements. Topics include definitions of missing data mechanisms, how missing data relates to causal inference, and methods for dealing with missing data, including inverse probability weighting, multiple imputation, likelihood based methods, and the expectation-maximization algorithm. The format will be lectures combined with short assignments focusing on analysis of real data.
Longitudinal Data Analysis (1.5 credits)
The aim of the course is to introduce modern methods for the analysis of longitudinal and repeated measures studies which are commonly used in epidemiological studies and in clinical trials. Topics include an introduction to the analysis of longitudinal data, the analysis of response profiles, fitting parametric curves, covariance pattern models, random effects and growth curve models, generalized linear models for longitudinal data including generalized estimating equations (GEE), and generalized linear mixed models (GLMMs). The course comprises lectures, computer labs with exercises focusing on analysis of real data sets, group discussions, and literature review.
Bayesian Data Analysis (1.5 credits)
Bayesian data analysis is a very general approach to statistical modeling that is becoming more and more popular and accessible. It provides a uniform framework to build problem specific models that can be used for both statistical inference and for prediction. This course will give you an introduction to the principles of Bayesian inference and Bayesian data analysis. We will outline the relevant concepts and basic theory, but the focus will be on learning how to do Bayesian data analysis in practice. Practical examples will be used to illustrate the described methodologies.
Advanced Machine Learning Methods (1.5 credits)
This course covers machine learning methods for prediction from a statistical perspective. It focuses on the following broad classes of classical machine learning procedures: stacking, bagging, boosting, and regularization. For each of these topics, the course covers the theoretical justifications for the methods, computational issues and implementations, applications of the methods to real data, and interpretation of the results. In addition, the course covers the role of cross-validation and the bootstrap in developing and evaluating prediction models. There will be lectures with written and computer based assignments using real data examples.
Overview of current KI courses
KI has an overview of courses in biostatistics and statistical software. Information about courses can also be found on the webpages of the departments responsible for the courses.
Courses at LIME
Courses at IMM
Courses at GPH