Tag: causality

The Causal Inference Book: DAGS and more

Harvard (bio)statisticians Miguel Hernan and Jamie Robins just released their new book, online and accessible for free!

The Causal Inference book provides a cohesive presentation of causal inference, its concepts and its methods. The book is divided in 3 parts of increasing difficulty: causal inference without models, causal inference with models, and causal inference from complex longitudinal data. Here’s the official Harvard page for the book release.

Some of the book’s (NHEFS) data is accesible too:

In SAS, Stata, MS Excel, and CSV formats
Codebook

As is the associated computer code for the analyses, in multiple languages:

R by Joy Shi and Sean McGrath. Rendered version by Tom Palmer.
Python by James Fiedler
SAS by Roger Logan
Stata by Eleanor Murray and Roger Logan

This is definitely an interesting read for epidemiologists, statisticians, psychologists, economists, sociologists, political scientists, data scientists, computer scientists, and any other person with a love for proper data analysis!

Our revised #CausalInferenceBook is now freely available.

The book is organized in 3 parts of increasing difficulty: From counterfactuals and causal diagrams to treatment-confounder feedback and g-methods.

Thanks to everyone who sent us comments/typos.https://t.co/bRPFYazK2D
— Miguel Hernán (@_MiguelHernan) January 1, 2019

Sam Finalyson visualized some of the Directed Acyclic Graphs (DAG) covered in the book, and these also look quite nice. The visuals and other notes and glossary items here.

Last week my wife was out of town, research was slow, and summer self-study aspirations were high, so I sat down and organized:

ALL THE 50+ DAGS from @_MiguelHernan and Robins Causal Inference book (P1):https://t.co/vKcdclTQbe

Some notes on key conceptshttps://t.co/IcLZCLlrZW pic.twitter.com/UZWSauvQxT
— Sam Finlayson (@IAmSamFin) June 20, 2019

Cover image via blytheadamson.com

Causal Random Forests, by Mark White

I stumbled accros this incredibly interesting read by Mark White, who discusses the (academic) theory behind, inner workings, and example (R) applications of causal random forests:

EXPLICITLY OPTIMIZING ON CAUSAL EFFECTS VIA THE CAUSAL RANDOM FOREST: A PRACTICAL INTRODUCTION AND TUTORIAL (By Mark White)

These so-called “honest” forests seem a great technique to identify opportunities for personalized actions: think of marketing, HR, medicine, healthcare, and other personalized recommendations. Note that an experimental setup for data collection is still necessary to gather the right data for these techniques.

https://www.markhw.com/blog/causalforestintro

Propensity Score Matching Explained Visually

Propensity score matching (wiki) is a statistical matching technique that attempts to estimate the effect of a treatment (e.g., intervention) by accounting for the factors that predict whether an individual would be eligble for receiving the treatment. The wikipedia page provides a good example setting:

Say we are interested in the effects of smoking on health. Here, smoking would be considered the treatment, and the ‘treated’ are simply those who smoke. In order to find a cause-effect relationship, we would need to run an experiment and randomly assign people to smoking and non-smoking conditions. Of course such experiments would be unfeasible and/or unethical, as we can’t ask/force people to smoke when we suspect it may do harm.
We will need to work with observational data instead. Here, we estimate the treatment effect by simply comparing health outcomes (e.g., rate of cancer) between those who smoked and did not smoke. However, this estimation would be biased by any factors that predict smoking (e.g., social economic status). Propensity score matching attempts to control for these differences (i.e., biases) by making the comparison groups (i.e., smoking and non-smoking) more comparable.

Lucy D’Agostino McGowan is a post-doc at Johns Hopkins Bloomberg School of Public Health and co-founder of R-Ladies Nashville. She wrote a very nice blog explaining what propensity score matching is and showing how to apply it to your dataset in R. Lucy demonstrates how you can use propensity scores to weight your observations in such a way that accounts for the factors that correlate with receiving a treatment. Moreover, her explainations are strenghtened by nice visuals that intuitively demonstrate what the weighting does to the “pseudo-populations” used to estimate the treatment effect.

Have a look yourself: https://livefreeordichotomize.com/2019/01/17/understanding-propensity-score-weighting/