Learn Programming Project-Based: Build-Your-Own-X

Last week, this interesting reddit thread was filled with overviews for cool projects that may help you learn a programming language. The top entries are:

Build Your Own X, by Dani Stefanovic
Project-based Learning, by Tu Tran
Projects from Scratch, by Algory L.
Project-based Tutorials in C, by Robby
Awesome DIY Software, by Cameron Eagans

There’s a wide range of projects you can get started on building:

If you want to focus on building stuff in a specific programming language, you can follow these links:

If you’re really into C, then follow these links to build your own:

Data Visualization Style Guide Repositories

Amy Cesal put together (1) this great overview of style guides for data visualization practice. Moreover, in the original tweet, Amy refers to other great repositories such as (2) this PolicyViz one and (3) this humongous one by Adele.

Spreadsheet of #dataviz style guides: https://t.co/lLQUT5Qwi0

And a form for adding more: https://t.co/i14hb0fZOO pic.twitter.com/qJ2vhcl7QV
— Amy Cesal (@AmyCesal) June 21, 2019

Amy’s list includes many references to the best practices used by some of the leading data journalism companies, such as the BBC, or professional data companies like Salesforce and IBM.

As I’m worried that this great repository may not stand the test of time on the current Google Docs location, here are the base URLs once more:

URL of guidelines	Company name
https://sunlightfoundation.com/2014/03/12/datavizguide	Sunlight Foundation
https://cfpb.github.io/design-manual/data-visualization/data-visualization.html	Consumer Financial Protection Bureau
https://knightcenter.utexas.edu/mooc/file/tdmn_graphics.pdf	Dallas Morning News
https://urbaninstitute.github.io/graphics-styleguide/	The Urban Institute
http://code.minnpost.com/minnpost-styles/	MinnPost
https://public.tableau.com/profile/bbc.audiences#!/vizhome/BBCAudiencesTableauStyleGuide/Hello	BBC Audiences
https://www.ibm.com/design/v1/language/experience/data-visualization/	IBM
https://style.ons.gov.uk/category/data-visualisation/	Office for National Statistics
https://www.ibcs.com/standards	International Business Communication Standards (IBCS®)
https://data.london.gov.uk/blog/city-intelligence-data-design-guidelines/	London City Intelligence
https://www.bbc.co.uk/gel/guidelines/how-to-design-infographics	BBC
https://polaris.shopify.com/design/data-visualizationst	Shopify
https://ux.opower.com/opattern/how-to-charts.html	Opower
https://www.consults-iot.com	Consults-IoT.Com LLP
https://ux.mailchimp.com/patterns/data	MailChimp
https://material.io/design/communication/data-visualization.html	Google- Material Design
https://lightningdesignsystem.com/guidelines/charts/	Salesforce
https://github.com/glosophy/CatoDataVizGuidelines/blob/master/PocketStyleBook.pdf	Cato Institute
https://bbc.github.io/rcookbook/	BBC
https://docs.microsoft.com/en-us/office/dev/add-ins/design/data-visualization-guidelines	Microsoft
https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perception	ACI

If you have any resources or style guides to contribute to Amy’s list, you can do so via this link.

Causal Random Forests, by Mark White

I stumbled accros this incredibly interesting read by Mark White, who discusses the (academic) theory behind, inner workings, and example (R) applications of causal random forests:

EXPLICITLY OPTIMIZING ON CAUSAL EFFECTS VIA THE CAUSAL RANDOM FOREST: A PRACTICAL INTRODUCTION AND TUTORIAL (By Mark White)

These so-called “honest” forests seem a great technique to identify opportunities for personalized actions: think of marketing, HR, medicine, healthcare, and other personalized recommendations. Note that an experimental setup for data collection is still necessary to gather the right data for these techniques.

https://www.markhw.com/blog/causalforestintro

2019 Shortlist for the Royal Society Prize for Science Books

Since 1988, the Royal Society has celebrated outstanding popular science writing and authors.

Each year, a panel of expert judges choose the book that they believe makes popular science writing compelling and accessible to the public.

Over the decades, the Prize has celebrated some notable winners including Bill Bryson and Stephen Hawking.

The author of the winning book receives £25,000 and £2,500 is awarded to each of the five shortlisted books. And this year’s shortlist includes some definite must-reads on data and statistics!

Infinite Powers – by Steven Strogatz

The captivating story of mathematics’ greatest ever idea: calculus. Without it, there would be no computers, no microwave ovens, no GPS, and no space travel. But before it gave modern man almost infinite powers, calculus was behind centuries of controversy, competition, and even death.
Taking us on a thrilling journey through three millennia, Professor Steven Strogatz charts the development of this seminal achievement, from the days of Archimedes to today’s breakthroughs in chaos theory and artificial intelligence. Filled with idiosyncratic characters from Pythagoras to Fourier, Infinite Powers is a compelling human drama that reveals the legacy of calculus in nearly every aspect of modern civilisation, including science, politics, medicine, philosophy, and more.
https://royalsociety.org/grants-schemes-awards/book-prizes/science-book-prize/2019/infinite-powers/

Invisible Women – by Caroline Criado Perez

Imagine a world where your phone is too big for your hand, where your doctor prescribes a drug that is wrong for your body, where in a car accident you are 47% more likely to be seriously injured, where every week the countless hours of work you do are not recognised or valued. If any of this sounds familiar, chances are that you’re a woman.
Invisible Women shows us how, in a world largely built for and by men, we are systematically ignoring half the population. It exposes the gender data gap–a gap in our knowledge that is at the root of perpetual, systemic discrimination against women, and that has created a pervasive but invisible bias with a profound effect on women’s lives. From government policy and medical research, to technology, workplaces, urban planning and the media, Invisible Women reveals the biased data that excludes women.
https://royalsociety.org/grants-schemes-awards/book-prizes/science-book-prize/2019/invisible-women/

Six Impossible Things – by John Gribbin

This book does not deal with data or statistics specifically, but might even be more interesting, as it covers the topic of quantum physics:

Quantum physics is strange. It tells us that a particle can be in two places at once. That particle is also a wave, and everything in the quantum world can be described entirely in terms of waves, or entirely in terms of particles, whichever you prefer.
All of this was clear by the end of the 1920s, but to the great distress of many physicists, let alone ordinary mortals, nobody has ever been able to come up with a common sense explanation of what is going on. Physicists have sought ‘quanta of solace’ in a variety of more or less convincing interpretations.
This short guide presents us with the six theories that try to explain the wild wonders of quantum. All of them are crazy, and some are crazier than others, but in this world crazy does not necessarily mean wrong, and being crazier does not necessarily mean more wrong.
https://royalsociety.org/grants-schemes-awards/book-prizes/science-book-prize/2019/six-impossible-things/

The other shortlisted books

Data Engineering Reading List, by Mapflat

Lars Albertsson, former software engineer at Spotify and Google and currently freelance data engineer via mapflat, maintains this list of data engineering resources. It includes many links to videos and courses about data pipelines, batch processing, Kafka, NoSQL, Clojure, Scala, Parquet, Luigi, Storm, Spark, Hadoop, Cassandra, and other tools I am not too familiar with. Looks like it could function as a great curated overview for starters.

Cover image via Lynda.com

Tidy Machine Learning with R’s purrr and tidyr

Jared Wilber posted this great walkthrough where he codes a simple R data pipeline using purrr and tidyr to train a large variety of models and methods on the same base data, all in a non-repetitive, reproducible, clean, and thus tidy fashion. Really impressive workflow!