Grant Sanderson is the owner of YouTube channel 3Blue1Brown, which aims to explain math and stats concepts in an entertaining way. Using animations, Grant grasps difficult problems and explains them in understandable language. I was already familiar with the great explanatory videos on Linear Algebra and Neural Networks, but this new video on cryptocurrencies and blockchain (below) is definitely one of the best explanations of Bitcoin I’ve seen so far:
Tag: learning
Hierarchical Linear Models 101
Multilevel models (also known as hierarchical linear models, nested data models, mixed models, random coefficient, random-effects models, random parameter models, or split-plot designs) are statistical models of parameters that vary at more than one level (Wikipedia). They are very useful in Social Sciences, where we are often interested in individuals that reside in nations, organizations, teams, or other higher-level units. Next to their individuals characteristics, the characteristics of these units they belong to may also have effects. To take into account effects from variables residing at multiple levels, we can use multilevel or hierarchical models.
Michael Freeman, a faculty member at the University of Washington Information School. made this amazing visual introduction to hierarchical modeling:

If you want to practice hierarchical modeling in R, I recommend the lesson by Page Paccini (first video) or the more elaborate video series by Statistics of DOOM (second):
Geographical Maps in ggplot2: Rectangle World Map
Maarten Lambrechts posted a tutorial where he demonstrates the steps through which he created a Eurovision Song Festival map in R.


Inspired by his tutorial, I decided to create a worldmap of my own, the R code for which you may find below.
options(stringsAsFactors = F) # options
library(tidyverse) # packages
# retrieve data file
link = "https://gist.githubusercontent.com/maartenzam/787498bbc07ae06b637447dbd430ea0a/raw/9a9dafafb44d8990f85243a9c7ca349acd3a0d07/worldtilegrid.csv"
geodata <- read.csv(link) %>% as.tibble() # load in geodata
str(geodata) # examine geodata
## Classes 'tbl_df', 'tbl' and 'data.frame': 192 obs. of 11 variables:
## $ name : chr "Afghanistan" "Albania" "Algeria" "Angola" ...
## $ alpha.2 : chr "AF" "AL" "DZ" "AO" ...
## $ alpha.3 : chr "AFG" "ALB" "DZA" "AGO" ...
## $ country.code : int 4 8 12 24 10 28 32 51 36 40 ...
## $ iso_3166.2 : chr "ISO 3166-2:AF" "ISO 3166-2:AL" "ISO 3166-2:DZ" "ISO 3166-2:AO" ...
## $ region : chr "Asia" "Europe" "Africa" "Africa" ...
## $ sub.region : chr "Southern Asia" "Southern Europe" "Northern Africa" "Middle Africa" ...
## $ region.code : int 142 150 2 2 NA 19 19 142 9 150 ...
## $ sub.region.code: int 34 39 15 17 NA 29 5 145 53 155 ...
## $ x : int 22 15 13 13 15 7 6 20 24 15 ...
## $ y : int 8 9 11 17 23 4 14 6 19 6 ...
# create worldmap
worldmap <- ggplot(geodata)
# add rectangle grid + labels
worldmap +
geom_rect(aes(xmin = x, ymin = y,
xmax = x + 1, ymax = y + 1)) +
geom_text(aes(x = x, y = y,
label = alpha.3))

# improve geoms
worldmap +
geom_rect(aes(xmin = x, ymin = y,
xmax = x + 1, ymax = y + 1,
fill = region)) +
geom_text(aes(x = x, y = y,
label = alpha.3),
size = 2,
nudge_x = 0.5, nudge_y = -0.5,
vjust = 0.5, hjust = 0.5) +
scale_y_reverse()

# finalize plot look
colors = c('yellow', 'red', 'white', 'pink', 'green', 'orange')
worldmap +
geom_rect(aes(xmin = x, ymin = y,
xmax = x + 1, ymax = y + 1,
fill = region)) +
geom_text(aes(x = x, y = y,
label = alpha.3),
size = 3,
nudge_x = 0.5, nudge_y = -0.5,
vjust = 0.5, hjust = 0.5) +
scale_y_reverse() +
scale_fill_manual(values = colors) +
guides(fill = guide_legend(ncol = 2), col = F) +
theme(plot.background = element_rect(fill = "blue"),
panel.grid = element_blank(),
panel.background = element_blank(),
legend.background = element_blank(),
legend.position = c(0, 0),
legend.justification = c(0, 0),
legend.title = element_text(colour = "white"),
legend.text = element_text(colour = "white"),
legend.key = element_blank(),
legend.key.size = unit(0.06, "npc"),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
text = element_text(colour = "white", size = 16)
) +
labs(title = "ggplot2: Worldmap",
fill = "Region",
caption = "paulvanderlaken.com")

What would you add to your worldmap? If you end up making one, please send me a copy on paulvanderlaken@gmail.com!
Regular Expression Crosswords
A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. The regex equivalent is .*\.txt$.
Last week I posted a first tutorial on Regular Expressions in R and I am working its sequels. You may find additional resources on Regular Expressions in the learning overviews (R, Python, Data Science).
Today I came across this website of Regular Expression Crosswords, which proves a great resource to playfully master regular expression. All puzzles are validated live using the JavaScript regex engine. The figure below explains how it works

Via the links below you can jump puzzles that matches your expertise level:
New to R? Kickstart your learning and career with these 6 steps!
For newcomers, R code can look like old Egyptian hieroglyphs with its weird operators (%in%,<-,||, or %/%). The R language has been said to have a steep learning curve and although there are many introductory courses and books (see R Resources), it’s hard to decide where to start.
Fortunately, I am here to help! The below is a six-step guide on how to learning R, using only open access (i.e., free!) materials.
Although oriented at complete newcomers, it will have you writing your own practical scripts and programs in no time: just start at #1 and work your way to coding mastery!
If you already feel comfortable with the basics of R — or don’t like basics — you can start at #5 and jump into practical learning via the tidyverse.
Good luck!!!
Step 1: An R Folder (15 min)
Create a directory for your R learning stuff somewhere on your computer. Download this (very) short introduction to R by Paul Torfs and Claudia Bauer and store it in that folder. Now read the introduction and follow the steps. It will help you install all R software on your own computer and familiarize you with the standard data types.
Step 2: Handy Cheat Sheets (15 min)
Many standard functions exist in R and after a while you will remember them by heart. For now, it’s good to have a dictionary or references close by hand. Download and read the cheat sheets for base R (Mhairi McNeill) and R base functions (Tom Short). Because you’ll be writing most of your R scripts in RStudio, it’s also recommended to have an RStudio cheat sheet as well as an RStudio keyboard shortcuts cheat sheet by hand.
Step 3: swirl Away in RStudio (8h)
Now you’re ready to really start learning and we’re going to accelerate via swirl. Open up your RStudio and enter the two lines of code below in your console window.
install.packages('swirl') #download swirl package
library(swirl) #load in swirl package
swirl (webpage) will automatically start and after a couple of prompts you will be able to choose the learning course called 1: R Programming: The basics of programming in R (see below). This course consists of 15 modules via which you will master the basics of R in the environment itself. Start with module 1 and complete between one to three modules per day, so that you finish the swirl course in a week.


Step 4: A Pirate’s Guide to R (10h)
OK, you should now be familiar with the basics of R. However, knowledge is crystallized via repetition. I therefore suggest, you walk through the book YaRrr! The Pirate’s Guide to R (Phillips, 2017) starting in chapter 3. It’s a fun book and will provide you with more knowledge on how to program custom functions, loops, and some basic statistical modelling techniques – the thing R was actually designed for.
Step 5: R for Data Science (16h)
By now, you can say you might say you are an adapt R programmer with statistical modelling experience. However, you have been working with base R functions mostly, knowledge of which is a must-have to really understand the language. In practice, R programmers rely strongly on developed packages nevertheless. A very useful group of packages is commonly referred to as the tidyverse. You will be amazed at how much this set of packages simplifies working in R. The next step therefore, is to work through the book R for Data Science (Grolemund & Wickham, 2017) (hardcopy here).
Step 6: Specialize (∞)
You are now several steps and a couple of weeks further. You possess basic knowledge of the R language, know how to write scripts in RStudio, are capable of programming in base R as well as using the advanced functionality of the tidyverse, and you have even made a start with some basic statistical modelling.
It’s time to set you loose in the wonderful world of the R community. If you had not done this earlier, you should get accounts on Stack Overflow and Cross Validated. You might also want to subscribe to the R Help Mailing List, R Bloggers, and to my website obviously.
On Twitter, have a look at #rstats and, on reddit, subscribe to the rstats, rstudio, and statistics threads. At this time, I can’t but advise you to return to the R Resources Overview and to continue broadening your R programming skills. Pick materials in the area that interests you:
- If you want to become a hardcore programmer, this R programming course may better suit you and you will want to work your way through the books Advanced R (Wickham, 2014) and Efficient R Programming (Gillespie & Lovelace, 2017).
- If you want to become a program developer, building functions and packages, you also want to consider mastering Software Development in R (Peng, Kross, & Anderson, 2017).
- If you like visualization, look into the R Graph Gallery with code examples and read this practical introduction to ggplot2 (Healy, 2017) and the Hitchhiker’s Guide to ggplot2 in R (Burchell & Vargas, 2016).
- If you like interactive visualizations, you will want to look at the above as well as R Shiny, the dashboarding resources, and the HTML Widgets that R offers.
- If you want to become a data scientist, focus on machine learning via this course on statistical learning (Hastie & Tibshirani, 2014). If you prefer a shorter, practical introduction, try this Kaggle Competition Titanic walkthrough on Youtube.
- If you like automation and reporting, start with the basics of markdown and regular expressions. Also consider reading the R Markdown Definitive Guide (Xie, Allaire, & Grolemund, 2018).
- If you’re more interested in text analysis and text mining, knowledge of regular expressions is a must-have and a good additional start would be the book on Tidy Text Mining (Silges & Robinson, 2017).
Neural Networks 101
Last month, a video by 3Blue1Brown has been trending on YouTube, accumulating already over a quarter of a million views. It only lasts 10 minutes but provides a very good and intuitive explanation of the inner workings of Neural Networks (NN):
The Machine Learning & Deep Learning book I wrote about recently provides a more substantial explanation of the different NNs and their inner workings. Neural nets come in various different flavors and my list of Data Science, Machine Learning, & Statistics Resources includes useful cheatsheets and other information, such as the architecture map below.

If you still haven’t had enough, Daniel Shiffman demonstrates how to code Neural Networks in Processing (Java), and the video displays precisely what happens behind the scenes. Finally, MIT has made their AI course material open-source, and it includes two 45 minute lectures on NNs. The lecturing professor – Patrick Winston – isn’t much of a fan of these “bulldozer” algorithms. He has a stronger preference for “more sophisticated” mathematical learning through, for instance, Support Vector Machines.





