Tag: Bayes

Bayesian data analysis for newcomers

Professor John Kruschke and Torrin Liddell – one of his Ph.D. students at Indiana University – wrote a fantastically useful scientific paper introducing Bayesian data analysis to the masses. Kruschke and Liddell explain the main ideas behind Bayesian statistics, how Bayesians deal with continuous and binary variables, how to use and set meaningful priors, the differences between confidence and credibility intervals, how to perform model comparison tests, and many more. The paper is published open access so you can read it here.

I found it incredibly useful, providing me with a better understanding of how Bayesian analysis works, what kind of questions you can answer with it, and what the resulting insights would comprise of. After reading it, I was honestly asking myself why I don’t use Bayesian methods more often… So what’s next, how to learn more?

If you are equally convinced and want to really learn Bayesian statistics, you might want to have a look at Kruschke’s book Doing Bayesian Data Analysis: A tutorial with R, JAGS, and Stan.
If you’re up for the challenge, Kruschke and Liddel also published a more technical paper on Bayesian New Statistics at the same time as this introductory paper, also open-source.
You can also start doing some simple analysis:
- In R, theBayesFactor package and brms will get you started (suggested by u/data_for_everyone).
- In Python, pystan and pymc3 are helpful (suggested by u/joefromlondon).
If you prefer a more visual explanation of the fundamentals of Bayesian statistics, have a look at this YouTube video by Veritasium.

Advanced GIFs in R

Rafa Irizarry is a biostatistics professor and one of the three people behind SimplyStatistics.org (the others are Jeff Leek, Roger Peng). They post ideas that they find interesting and their blog contributes greatly to discussion of science/popular writing.

Rafa is the creator of many data visualization GIFs that have recently trended on the web, and in a recent post he provides all the source code behind the beautiful imagery. I sincerely recommend you check out the orginal blog if you want to find out more, but here are the GIFS:

Simpson’s paradox is a statistical phenomenon where an observed relationship within a population reverses within all subgroups that make up that population. Rafa visualized it wonderfully in a GIF that took only twenty-some lines of R code:

A different statistical phenomenon is discussed at the end of the original blog: namely the ecological fallacy. It occurs when correlations that occur on the group-level are erroneously extrapolated to the individual-level. Rafa used the gapminder data included in the dslabs package to illustrate the fallacy: there is a very high correlation at the region level and a lower correlation at the individual country level:

The gapminder data is also used in the next GIF. This mimics Hans Rosling’s famous animation during his talk on New Insights on Poverty, but then made with R and gganimate by Rafa:

A next visualization demonstrates how the UN voting data (of Erik Voeten and Anton Strezhnev) can be used to examine different voting behaviors. It seems to reduce the voting data to a two-dimensional factor structure, and seemingly there are three distinct groups of voters these days, with particularly the USA and Israel far removed from other members:

The next GIFs are more statistical. The one below demonstrates how the local regression (LOESS) works. Simply speaking, LOESS determines the relationship for a local subset of the population and when you iteratively repeat this for all local subsets in a population you get a nicely fitting LOESS curve, the red line in Rafa’s GIF:

Not quite sure how to interpret the next one, but Rafa explains it visualized a random forest’s predictions using only one predictor variable. I think that different trees would then provide different predictions because they leverage different training samples, and an ensemble of those trees would then improve predictive accuracy?

The next one is my favorite I think. This animation illustrates how a highly accurate test would function in a population with low prevalence of true values (e.g., disease, applicant success). More details are in the original blog or here.

The blog ends with a rather funny animation of the only good use of pie charts, according to Rafa:

Data Science, Machine Learning, & Statistics resources (free courses, books, tutorials, & cheat sheets)

Welcome to my repository of data science, machine learning, and statistics resources. Software-specific material has to a large extent been listed under their respective overviews: R Resources & Python Resources. I also host a list of SQL Resources and datasets to practice programming. If you have any additions, please comment or contact me!

LAST UPDATED: 21-05-2018

Courses:

Video:

Books:

Sentiment Lexicons:

Cheatsheets:

Other:

Google Fonts – huge collection of text fonts
Checkmycolours.com – check whether your colours have enough contrast
Vischeck.com – check whether your images are colorblind-friendly
Coblis – Color Blind Simulation
Color Oracle – color blind simulation
Chrome color enhancer – customizable color filter for website browsing

Must read: Computer Age Statistical Inference (Efron & Hastie, 2016)

Statistics, and statistical inference in specific, are becoming an ever greater part of our daily lives. Models are trying to estimate anything from (future) consumer behaviour to optimal steering behaviours and we need these models to be as accurate as possible. Trevor Hastie is a great contributor to the development of the field, and I highly recommend the machine learning books and courses that he developed, together with Robert Tibshirani. These you may find in my list of R Resources (Cheatsheets, Tutorials, & Books).

Today I wanted to share another book Hastie wrote, together with Bradley Efron, another colleague of his at Stanford University. It is called Computer Age Statistical Inference (Efron & Hastie, 2016) and is a definite must read for every aspiring data scientist because it illustrates most algorithms commonly used in modern-day statistical inference. Many of these algorithms Hastie and his colleagues at Stanford developed themselves and the book handles among others:

Regression:
- Logistic regression
- Poisson regression
- Ridge regression
- Jackknife regression
- Least angle regression
- Lasso regression
- Regression trees
Bootstrapping
Boosting
Cross-validation
Random forests
Survival analysis
Support vector machines
Kernel smoothing
Neural networks
Deep learning
Bayesian statistics

Veritasium: Bayes’ Theorem explained

Veritasium makes educational video’s, mostly about science, and recently they recorded one offering an intuitive explanation of Bayes’ Theorem. They guide the viewer through Bayes’ thought process coming up with the theory, explain its workings, but also acknowledge some of the issues when applying Bayesian statistics in society.

“The thing we forget in Bayes’ Theorem is that our actions play a role in determining outcomes, in determining how true things actually are.” 8.23

“A really good understanding of Bayes’ Theorem implies that experimentation is essential: if you’ve been doing the same thing for a long time and getting the same result – that you’re not necessarily happy with – maybe it’s time to change.” 8.48

The video, see below, lasts around 9 minutes.