Sean Owen created this handy cheat sheet that shows the most common probability distributions mapped by their underlying relationships.
Probability distributions are fundamental to statistics, just like data structures are to computer science. They’re the place to start studying if you mean to talk like a data scientist.Sean Owen (via)
Owen argues that the probability distributions relate to each other in intuitive and interesting ways that makes it easier for you to recall them. For instance, several follow naturally from the Bernoulli distribution. Having this map by hand should thus help you really understand what these distributions imply.
On top of that, it’s just a nice geeky network poster!
Now, Sean didn’t just make a fancy map. In the original blog he also explains each of the distributions and how it relates to the others. Having this knowledge is vital to being a good data scientist / analyst.
You can sometimes get away with simple analysis using R or scikit-learn without quite understanding distributions, just like you can manage a Java program without understanding hash functions. But it would soon end in tears, bugs, bogus results, or worse: sighs and eye-rolling from stats majors.Sean Owen (via)
For instance, here’s Sean explaining the Binomial distribution:
The binomial distribution may be thought of as the sum of outcomes of things that follow a Bernoulli distribution. Toss a fair coin 20 times; how many times does it come up heads? This count is an outcome that follows the binomial distribution. Its parameters are n, the number of trials, and p, the probability of a “success” (here: heads, or 1). Each flip is a Bernoulli-distributed outcome, or trial. Reach for the binomial distribution when counting the number of successes in things that act like a coin flip, where each flip is independent and has the same probability of success.Sean Owen (via)
Header image via Alison-Static