# Statistical Methods Week June 7th-10th 2022

The strategic research area in epidemiology and biostatistics and the statistical methods node of the national network for register-based research invite all interested in statistical methods for the analysis of data to attend the seminars.

The goal is for researchers in all disciplines to interact, discuss new statistical methods, and discover the wealth of methods that are being developed. The tutorials are intended for biomedical researchers or practitioners with research in or using mathematics, biostatistics, and computational statistics.

## Locations

The talks will be held in Franklinsalen, Tomtebodavägen 6, Karolinska Institutet Solna

The locations of tutorials will be announced with acceptance into the session. Application and acceptance are required for the tutorials, please see the title of each tutorial for the link to sign up. Deadline: May 27. We have now closed the sign-up.

## Registration

No registration is necessary for the seminars, first come first served.

Please note that you need to apply for the tutorials (links to sign up are next to each tutorial)!

## Sponsors

We gratefully acknowledge our sponsors Sfo Epi and Biostatistics and the VR Environment Grant in Statistical Methods for Register Based Research.

## Schedule for the Statistical Methods Week

## Tuesday June 7

9:30-10:30 Dr. Jason Liang - "Prognostic accuracy measures for survival models"

10:30-10:45 Break for switching speakers

10:45-11:45 Dr. Brice Ozenne - "Toward a unified framework for analyzing repeated measurements with a continuous outcome"

11:45-13:30 LUNCH

13:30-16:30 Tutorial: "Prognostic accuracy measures for survival with Covid-19 example" in R with Dr. Liang.

13:30-16:30 Tutorial: "Analysis of repeated measurements with mixed models using the R package LMMstar" with Dr. Ozenne and Dr. Forman

## Wednesday June 8

9:30-10:30 Professor Hongwei Zhao - "An improved survival estimator for censored medical costs using kernel methods"

10:30-10:45 Break for switching speakers

10:45-11:45 Professor Ken Rice - "Fixing fixed-effects meta-analysis: some theoretical and practical advances"

11:45-13:30 LUNCH

13:30-15:30 Tutorial: "Analysis of medical costs with censored data" In Stata with Professor Zhao.

13:30-16:30 Tutorial: "Introduction to meta-analysis" in R with Professor Rice.

## Thursday June 9

9:30-10:30 Professor Andrea Rotnitzky - "Towards deriving graphical rules for efficient estimation in causal graphical models"

10:30-10:45 Break for switching speakers

10:45-11:45 Dr. Michael Sachs - "Recent advances in regression modeling of censored time-to-event outcomes using pseudo-observations"

11:45-13:30 LUNCH

13:30-16:30 Tutorial: "Using the eventglm R package for regression modeling of censored time-to-event outcomes" with Dr. Sachs.

16:00-20:00 Welcome to also join Stockholm biostatistics network event in Aula Medica

## Friday June 10

9:30-10:30 Dr. Tim Morris - "Nonparametric bootstrapping for standard errors and confidence intervals: silver bullet or fool’s gold?"

10:30-10:45 Break for switching speakers

10:45-11:45 Professor Karla Diaz-Ordaz - "Causal machine learning for heterogeneous treatment effects"

11:45-13:30 LUNCH

13:30-16:30 Tutorial: "Nonparametric bootstrap" in Stata with Dr. Morris.

13:30-16:30 Tutorial: "Introduction to causal machine learning" in R with Professor Diaz-Ordaz.

## Abstracts

#### Dr. Jason Liang - "Prognostic accuracy measures for survival models"

Many prognostic models are created using survival data. In practice, the temporal aspect of survival data is often underused. I will outline a number of existing methods for evaluating prognostic survival models. In particular, the emphasis will be on tools that can quantify how prognostic performance varies with time. I will also present a complementary new tool we have developed, the hazard discrimination summary (HDS). HDS is an interpretable, risk-based measure of how a model’s discrimination varies with time. I will also describe a connection between HDS and the Cox model partial likelihood.

#### Dr. Brice Ozenne - "Toward a unified framework for analyzing repeated measurements with a continuous outcome"

Paired t-test, ANCOVA, Linear Mixed Models (LMM), Latent Variable Models (LVM), are some of the many statistical tools being used to analyze repeated measurements of a continuous outcomes. These tools typically lead to different results, which can be confusing to the practitioner. The discrepancy may originate from a different underlying statistical model, estimation procedure, uncertainty quantification, or a combination of the previous.

In this talk, we recast the previously mentionned tools as a parametrization of the mean and residual variance-covariance structure, with a degree of flexibility that is tool dependent. We therefore view tools as a way to communicate a parametrization to an estimation procedure. Equivalence between tools (e.g. ANCOVA as a LMM) and consequences of the parametrization (e.g. random intercept model) are illustrated in real data examples. We also describe how uncertainty quantification can be performed based purely on the first, second, and third derivative of the log-likelihood, and match exact methods (e.g. paired t-test) nearly exactly.

We finish the talk by discussing how to handle missing outcome values, which frequently arise in longitudinal studies. We show the connection between using a LMM to handle missing data and a t-test where missing data has been imputed based on conditional distributions implied by the model. Consequences are that missing data due to terminal events should not be treated in a LMM/LVM framework and correct specification not only of the mean but also of the covariance structure is typically needed to ensure valid statistical inference.

#### Professor Hongwei Zhao - "An improved survival estimator for censored medical costs using kernel methods"

Costs assessment and cost-effectiveness analysis serve as an essential part in economic evaluation of medical interventions. In clinical trials and many observational studies, costs as well as survival data are frequently censored. Standard techniques for survival-type data are often invalid in analyzing censored cost data, due to the induced dependent censoring problem (Lin et al., 1997). In this talk, we will first examine the equivalency between a redistribute-to-the right (RR) algorithm and the popular Kaplan-Meier method for estimating the survival function of time (Efron, 1967). Next, we will extend the RR algorithm to the problem of estimating the survival function of medical costs, and discuss RR-based estimators. Finally, we will propose a kernel-based estimator for the survival function of costs, which is shown to be monotone, asymptotically unbiased, and more efficient than some existing survival estimators. We will conduct simulation experiments to compare these survival estimators for costs and apply them to a data example from a randomized cardiovascular clinical trial.

#### Professor Ken Rice - "Fixing fixed-effects meta-analysis: some theoretical and practical advances"

Meta-analysis is a common tool for synthesizing results of multiple studies, for example combining clinical trial or genetic association signals. Despite being well-established, some of its best-known methods are routinely misunderstood. Specifically, “fixed effects” (in the plural) methods do not require that exact homogeneity is assumed, and “random effects” methods are not the only way to address heterogeneity. This talk aims to fix this mess, providing a precise definition of what fixed-effects methods estimate – and why it is useful – before showing how this helps in practice. With motivating examples, we will give novel methods for better small-sample meta-analysis of quantitative outcomes, proportions, and odds ratios from 2x2 tables. It is hoped that these contributions will foster more direct connection of the questions that meta-analysts wish to answer with the statistical methods they choose.

#### Professor Andrea Rotnitzky - "Towards deriving graphical rules for efficient estimation in causal graphical models"

Causal graphs are responsible in great part for the explosive growth and adoption of causal inference in modern epidemiology and medical research. This is so because graphical models facilitate encoding and communicating causal assumptions and reasoning in an intuitive fashion, requiring minimal, if any at all, mathematical dexterity. Applying simple graphical rules, it is possible to easily communicate biases due to confounding and selection, which data are needed to correct for these biases, and to derive which statistical target parameter quantifies a causal effect of interest. Yet, causal graphical models encode a well defined statistical semiparametric model and little, if any, work has been done to investigate and derive simple graphical rules to encode efficiency in estimation. In this talk, I will present work towards bridging this gap. Given a causal graphical model, I will derive a set of graphical rules for determining the optimal adjustment set in a point exposure problem. This is the subset of variables in the graph that both suffices to control for confounding under the model and yields a non-parametric estimator of the population average treatment effect (ATE) of a dynamic, i.e. personalized, or static point exposure on an outcome with smallest asymptotic variance under any law in the model. I will then discuss the conditions for existence of a universally optimal adjustment set when the search is restricted to the realistic scenario in which only a subset of the graph variables is observable. For the problem of estimating the effect of a time dependent treatment I will discuss an impossibility result. Finally, I will describe graphical rules for constructing a reduced graph whose nodes represent only those variables that are informative for ATE and such that efficient estimation of ATE under the reduced graph and under the original graph agree.

#### Professor Karla Diaz-Ordaz - "Causal machine learning for heterogeneous treatment effects"

Treatment effect heterogeneity is often studied to aid treatment or policy decisions. In this context, methods for flexible estimation of conditional average treatment effect (CATE) of a binary exposure on an outcome have gained popularity recently. The CATE, however, may be a complicated function of the covariates. Here we present some strategies for CATE estimation which use data-adaptive methods, and provide novel “treatment effect variable importance measures”. Such variable importance measures may guide decision makers as to which variables are most important to consider when making treatment / policy decisions, and can help stratify populations for further study of subgroup effects. This is joint work with Oliver Hines and Stijn Vansteelandt.

#### Dr. Tim Morris - "Nonparametric bootstrapping for standard errors and confidence intervals: silver bullet or fool’s gold?"

Nonparametric bootstrapping is a general technique for estimating standard errors and/or constructing confidence intervals around parameter estimates. It is particularly useful when no closed formula is available or when some formula exists but is known to be problematic. The bootstrap has proved so popular and versatile that it earned its inventor, Bradley Efron, the International Prize in Statistics. In this seminar, I will give a non-technical description of the idea of nonparametric bootstrapping and some examples of where it can be useful. I will then describe how it can ‘fail’ in the sense that confidence intervals can be narrower or wider than they should be. This usually happens when the bootstrap is used carelessly, which is often the case. Some problems can be easily identified using simulation studies, and I will advocate that researchers consider running such studies before using the bootstrap

#### Dr. Michael Sachs - "Recent advances in regression modeling of censored time-to-event outcomes using pseudo-observations"

Pseudo-observations, as introduced by Andersen, Klein, and Rosthøj (2003), can be useful for modeling survival quantities as functions of covariates in a user-friendly and easy-to-interpret way. While they were first introduced in the standard right-censored time-to-event setting, recent methodological and theoretical advances have paved the way for applications in more complex multi-state and event history models. In this talk, I will review the pseudo-observation approach and it's theoretical justification. Then I will discuss some recent advances in the methodology, including methods for dealing with left-truncation, joint models for the illness-death model, and novel computational approaches, with a focus on potential implementations in the eventglm R package.