Category: statistics

A Visual Introduction to Hierarchical Models, by Michael Freeman

Hierarchical models I have covered before on this blog. These models are super relevant in practice. For instance, in HR, employee data is always nested within teams which are in turn nested within organizational units. Also in my current field of insurances, claims are always nested within policies, which can in turn be nested within product categories. Data is hierachical, and we need to take that into account when we model it.

Hierarchical models do just that. Interested in how they do this? Have a look at this amazing browser application made in React.js!

This project was built by Michael Freeman, a faculty member at the University of Washington Information School.
All code for this project is on GitHub, including the script to create the data and run regressions (done inR). Feel free to issue a pull request for improvements, and if you like it, share it on Twitter. Layout inspired by Tony Chu.
About this project

The Causal Inference Book: DAGS and more

Harvard (bio)statisticians Miguel Hernan and Jamie Robins just released their new book, online and accessible for free!

The Causal Inference book provides a cohesive presentation of causal inference, its concepts and its methods. The book is divided in 3 parts of increasing difficulty: causal inference without models, causal inference with models, and causal inference from complex longitudinal data. Here’s the official Harvard page for the book release.

Some of the book’s (NHEFS) data is accesible too:

In SAS, Stata, MS Excel, and CSV formats
Codebook

As is the associated computer code for the analyses, in multiple languages:

R by Joy Shi and Sean McGrath. Rendered version by Tom Palmer.
Python by James Fiedler
SAS by Roger Logan
Stata by Eleanor Murray and Roger Logan

This is definitely an interesting read for epidemiologists, statisticians, psychologists, economists, sociologists, political scientists, data scientists, computer scientists, and any other person with a love for proper data analysis!

Our revised #CausalInferenceBook is now freely available.

The book is organized in 3 parts of increasing difficulty: From counterfactuals and causal diagrams to treatment-confounder feedback and g-methods.

Thanks to everyone who sent us comments/typos.https://t.co/bRPFYazK2D
— Miguel Hernán (@_MiguelHernan) January 1, 2019

Sam Finalyson visualized some of the Directed Acyclic Graphs (DAG) covered in the book, and these also look quite nice. The visuals and other notes and glossary items here.

Last week my wife was out of town, research was slow, and summer self-study aspirations were high, so I sat down and organized:

ALL THE 50+ DAGS from @_MiguelHernan and Robins Causal Inference book (P1):https://t.co/vKcdclTQbe

Some notes on key conceptshttps://t.co/IcLZCLlrZW pic.twitter.com/UZWSauvQxT
— Sam Finlayson (@IAmSamFin) June 20, 2019

Cover image via blytheadamson.com

Podcasts for Data Science Start-Ups

Christopher of Neurotroph.de compiled this short list of data science podcasts worth listening to. See Chris’ original article for more details on the podcasts, but the links below take you to them directly:

Overviews of Graph Classification and Network Clustering methods

Thanks to Sebastian Raschka I am able to share this great GitHub overview page of relevant graph classification techniques, and the scientific papers behind them. The overview divides the algorithms into four groups:

Moreover, the overview contains links to similar collections on community detection, classification/regression trees and gradient boosting papers with implementations.

As well as a link to relevant graph classification benchmark datasets.

"Awesome Graph Classification" — A collection of graph classification methods, covering embedding, deep learning, graph kernel, and factorization papers with reference implementations https://t.co/ugpL3xSvf1
— Sebastian Raschka (@rasbt) July 16, 2019

Causal Random Forests, by Mark White

I stumbled accros this incredibly interesting read by Mark White, who discusses the (academic) theory behind, inner workings, and example (R) applications of causal random forests:

EXPLICITLY OPTIMIZING ON CAUSAL EFFECTS VIA THE CAUSAL RANDOM FOREST: A PRACTICAL INTRODUCTION AND TUTORIAL (By Mark White)

These so-called “honest” forests seem a great technique to identify opportunities for personalized actions: think of marketing, HR, medicine, healthcare, and other personalized recommendations. Note that an experimental setup for data collection is still necessary to gather the right data for these techniques.

https://www.markhw.com/blog/causalforestintro

2019 Shortlist for the Royal Society Prize for Science Books

Since 1988, the Royal Society has celebrated outstanding popular science writing and authors.

Each year, a panel of expert judges choose the book that they believe makes popular science writing compelling and accessible to the public.

Over the decades, the Prize has celebrated some notable winners including Bill Bryson and Stephen Hawking.

The author of the winning book receives £25,000 and £2,500 is awarded to each of the five shortlisted books. And this year’s shortlist includes some definite must-reads on data and statistics!

Infinite Powers – by Steven Strogatz

The captivating story of mathematics’ greatest ever idea: calculus. Without it, there would be no computers, no microwave ovens, no GPS, and no space travel. But before it gave modern man almost infinite powers, calculus was behind centuries of controversy, competition, and even death.
Taking us on a thrilling journey through three millennia, Professor Steven Strogatz charts the development of this seminal achievement, from the days of Archimedes to today’s breakthroughs in chaos theory and artificial intelligence. Filled with idiosyncratic characters from Pythagoras to Fourier, Infinite Powers is a compelling human drama that reveals the legacy of calculus in nearly every aspect of modern civilisation, including science, politics, medicine, philosophy, and more.
https://royalsociety.org/grants-schemes-awards/book-prizes/science-book-prize/2019/infinite-powers/

Invisible Women – by Caroline Criado Perez

Imagine a world where your phone is too big for your hand, where your doctor prescribes a drug that is wrong for your body, where in a car accident you are 47% more likely to be seriously injured, where every week the countless hours of work you do are not recognised or valued. If any of this sounds familiar, chances are that you’re a woman.
Invisible Women shows us how, in a world largely built for and by men, we are systematically ignoring half the population. It exposes the gender data gap–a gap in our knowledge that is at the root of perpetual, systemic discrimination against women, and that has created a pervasive but invisible bias with a profound effect on women’s lives. From government policy and medical research, to technology, workplaces, urban planning and the media, Invisible Women reveals the biased data that excludes women.
https://royalsociety.org/grants-schemes-awards/book-prizes/science-book-prize/2019/invisible-women/

Six Impossible Things – by John Gribbin

This book does not deal with data or statistics specifically, but might even be more interesting, as it covers the topic of quantum physics:

Quantum physics is strange. It tells us that a particle can be in two places at once. That particle is also a wave, and everything in the quantum world can be described entirely in terms of waves, or entirely in terms of particles, whichever you prefer.
All of this was clear by the end of the 1920s, but to the great distress of many physicists, let alone ordinary mortals, nobody has ever been able to come up with a common sense explanation of what is going on. Physicists have sought ‘quanta of solace’ in a variety of more or less convincing interpretations.
This short guide presents us with the six theories that try to explain the wild wonders of quantum. All of them are crazy, and some are crazier than others, but in this world crazy does not necessarily mean wrong, and being crazier does not necessarily mean more wrong.
https://royalsociety.org/grants-schemes-awards/book-prizes/science-book-prize/2019/six-impossible-things/