Visualizing the inner workings of the k-means clustering algorithm

Originally, I wrote this blog to share this interactive visualization of the k-means algorithm (wiki) which I was all enthusiastic about. However, then I imagined that not everybody may be familiar with k-means, hence, I wrote the whole blog below.  Next thing I know, u/dashee87 on r/datascience points me to these two other blogs that had already…

Summarizing our Daily News: Clustering 100.000+ Articles in Python

Andrew Thompson was interested in what 10 topics a computer would identify in our daily news. He gathered over 140.000 new articles from the archives of 10 different sources, as you can see in the figure below. In Python, Andrew converted the text of all these articles into a manageable form (tf-idf document term matrix…

Beer-in-hand Data Science

Obviously, analysing beer data in high on everybody’s list of favourite things to do in your weekend. Amanda Dobbyn wanted to examine whether she could provide us with an informative categorization the 45.000+ beers in her data set, without having to taste them all herself. You can find the full report here but you may also…