This blog explains t-SNE (t-Distributed Stochastic Neighbor Embedding) by a story of programmers joining forces with musicians to create the ultimate drum machine (if you are here just for the fun, you may start playing right away). Kyle McDonald, Manny Tan, and Yotam Mann experienced difficulties in pinpointing to what extent sounds are similar (ding, dong) … Continue reading t-SNE, the Ultimate Drum Machine and more

## Google Facets: Interactive Visualization for Everybody

Last week, Google released Facets, their new, open source visualization tool. Facets consists of two interfaces that allow users to investigate their data at different levels. Facets Overview provides users with a quick understanding of the distribution of values across the variables in their dataset. Overview is especially helpful in detecting unexpected values, missing values, unbalanced … Continue reading Google Facets: Interactive Visualization for Everybody

## Computing and visualizing PCA in R

Following my introduction to PCA, I will demonstrate how to apply and visualize PCA in R. There are many packages and functions that can apply PCA in R. In this post I will use the function `prcomp`

from the `stats`

package. I will also show how to visualize PCA in R using Base R graphics. However, my favorite visualization function for PCA is `ggbiplot`

, which is implemented by Vince Q. Vu and available on github. Please, let me know if you have better ways to visualize PCA in R.

** Computing the Principal Components (PC) **

I will use the classical `iris`

dataset for the demonstration. The data contain four continuous variables which corresponds to physical measures of flowers and a categorical variable describing the flowers’ species.

We will apply PCA to the four continuous variables and use the categorical variable to visualize the PCs later. Notice that in…

View original post 612 more words

## Gradient Descent 101

Gradient Descent is, in essence, a simple optimization algorithm. It seeks to find the gradient of a linear slope, by which the resulting linear line best fits the observed data, resulting in the smallest or lowest error(s). It is THE inner working of the linear functions we get taught in university statistics courses, however, many of us … Continue reading Gradient Descent 101

## Outliers 101

Data preparation forms a large part of every data science project. Claims go to extremes, stating that 80-95% of the workload for data scientists consists of data preparation. Outlier detection is one of the actions that make up this preparation phase. It is the process by which the analyst takes a closer look at the … Continue reading Outliers 101

## Statistics Visually Explained

Statistical literacy is essential to our data-driven society. Analytics has been and continues to be a game changer in many business fields, among other Human Resources. Yet, for all the increased importance and demand for statistical competence, the pedagogical approaches in statistics have barely changed. Seeing Theory is a project designed and created by Daniel … Continue reading Statistics Visually Explained

## Veritasium: Bayes’ Theorem explained

Veritasium makes educational video's, mostly about science, and recently they recorded one offering an intuitive explanation of Bayes' Theorem. They guide the viewer through Bayes' thought process coming up with the theory, explain its workings, but also acknowledge some of the issues when applying Bayesian statistics in society. "The thing we forget in Bayes' Theorem is … Continue reading Veritasium: Bayes’ Theorem explained

## Multi-Armed Bandits: The Smart Alternative for A/B Testing

Just as humans, computers learn by experience.The purpose of A/B testing is often to collect data to decide whether intervention A or B is better. As such, we provide one group with intervention A whereas another group receives intervention B. With the data of these two groups coming in, the computer can statistically estimate which … Continue reading Multi-Armed Bandits: The Smart Alternative for A/B Testing