Adapted from Vaan der Lann and Sherri Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer. 2011.

Targeted Maximum Likelihood Estimation: A Tutorial for Applied Researchers

When estimating the average treatment effect for a binary or continuous outcome, methods that incorporate propensity scores, the G-formula, or Targeted Maximum Likelihood Estimation (TMLE) are preferred over naïve regression approaches which often lead misspecified models. Some methods require correct specification of the outcome model, whereas other methods require correct specification of the exposure model. Doubly-robust methods only require correct specification of one of these models. TMLE is a semiparametric doubly-robust method that enhances correct model specification by allowing flexible estimation using non-parametric machine-learning methods and requires weaker assumptions than its competitors. We provide a step-by-step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (i.e., when a study participant had zero probability of receiving the treatment) are nearly violated. This tutorial ( provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R-code is provided in easy-to-read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the appendix and at the following GitHub repository:
Alternatively, readers can visit the following link where I provide a little bit more informal but interactive and reproducible TMLE tutorial using Rmarkdown. Furthermore, readers will gain more insights regarding the derivation of the Wald type confidence intervals based on the Functional Delta Method: