Data Engineering Reading List, by Mapflat

Lars Albertsson, former software engineer at Spotify and Google and currently freelance data engineer via mapflat, maintains this list of data engineering resources. It includes many links to videos and courses about data pipelines, batch processing, Kafka, NoSQL, Clojure, Scala, Parquet, Luigi, Storm, Spark, Hadoop, Cassandra, and other tools I am not too familiar with….

Neural Networks 101

Last month, a video by 3Blue1Brown has been trending on YouTube, accumulating already over a quarter of a million views. It only lasts 10 minutes but provides a very good and intuitive explanation of the inner workings of Neural Networks (NN): The Machine Learning & Deep Learning book I wrote about recently provides a more substantial explanation of the…

Where to look for your next job? An Interactive Map of the US Job Market

The people at Predictive Talent, Inc. took a sample of 23.4 million job postings from 5,200+ job boards and 1,800+ cities around the US.  They classified these jobs using the BLS Standard Occupational Classification tree and identified their primary work locations, primary job roles, estimated salaries, and 17 other job search-related characteristics. Next, they calculated five metrics for each role and city…

Data Science, Machine Learning, & Statistics resources (free courses, books, tutorials, & cheat sheets)

Welcome to my repository of data science, machine learning, and statistics resources. Software-specific material has to a large extent been listed under their respective overviews: R Resources & Python Resources. I also host a list of SQL Resources and datasets to practice programming. If you have any additions, please comment or contact me! LAST UPDATED: 21-05-2018 Courses: Udacity: Introduction to Descriptive Statistics…

t-SNE, the Ultimate Drum Machine and more

This blog explains t-Distributed Stochastic Neighbor Embedding (t-SNE) by a story of programmers joining forces with musicians to create the ultimate drum machine (if you are here just for the fun, you may start playing right away). Kyle McDonald, Manny Tan, and Yotam Mann experienced difficulties in pinpointing to what extent sounds are similar (ding, dong)…

Geographical maps using Shazam Recognitions

Shazam is a mobile app that can be asked to identify a song by making it “listen”’ to a piece of music. Due to its immense popularity, the organization’s name quickly turned into a verb used in regular conversation (“Do you know this song? Let’s Shazam it.“). A successful identification is referred to as a Shazam recognition. Shazam users can opt-in…

Digitizing the Tour de France 2017 – II

A few weeks back, I gave some examples of how data, predictive analytics, and visualization are changing the Tour de France experience. Today, I came across another wonderful example visualizing the sequences of geospatial data (i.e., the movement) of the cyclists during the 11th stage of the Tour de France  (blue dots). Moreover, the locations of…

Digitizing the Tour de France 2017

Combining two of my favorite things, Dimension Data elaborates on how they are using data, machine learning and predictive modeling to take the Tour de France experience to the next level in 2017. Eurosport already jumped on the bandwagon in 2016 with some amazing visualizations of common Tour scenarios. Here is one on how to win…

R learning: Neural Networks

Artificial neural networks (ANNs) are computing systems inspired by the human brain. They can teach themselves to do tasks, simply by considering examples of the tasks’ outcome. For example, they can learn to identify images that contain cats by analyzing example images that have been tagged “cat” or “no cat”. When given enough examples, the…