Tag: Tutorial

dygraphs

Today I learned about dygraphs, a fast, flexible open source JavaScript charting library. As everything in JavaScript, the charts produced by dygraphs integrate completely in the webbrowser and are thus very functional and interactive. See, for instance, the below where the graph highlights the y-axis value for both time series in the graph based on the x-axis value of my mouse location (January 24 2009). Very cool!

While I am no JS hero, the webpage includes a dypgrahs tutorial, as well as a playground environment.

Fortunately, I do know my way around R, and of course someone had already integrated dypgrahs in R in the form of the dygraphs R package. It works like a charm!

install.packages("dygraphs")
library("dygraphs")

dygraph(AirPassengers)

Also in R, your dygraphs are fully interactive, with my mouse hoevering over June 1951 in the below example.

And you can add all kinds of cool elements and modifications to the graphs, such as for instance a range selector:

dygraph(AirPassengers) %>% dyRangeSelector()

For the full range of visualization options dygraphs offers in R, please do have a look at the official RStudio page.

Interactive Explanation of Network and Graph Principles

Why do groups of people act smart, dumb, kind, or cruel? People behave in strange ways, particularly when they are able to influence one another. Both good and bad things can happen when people interact and behave in network structures. On the bright side, you must be familiar with the wisdom of the crowd, where the aggregated knowledge of a group is more valuable than its sum? Ensemble algorithms – like random forest analysis – rely on this positive principle.

On the dark side, are you familiar with the phenomenon called the tragedy of the commons, where shared resource-systems collapse because individuals behave in their self-interest? Or psychological phenomena such as groupthink, where groups of people make irrational decisions due to social issues? The recent spread of fake news and misinformation is also stimulated by network interactions. In these cases, we could speak of the madness of the crowd.

Nicky Case made a great interactive walkthrough explaining why and when networks of people become wise or mad. You are tasked to change and simulate network interactions while Nicky explains concepts such as (complex) contagion, the majority illusion paradox, bonding and bridging, and small world networks. In the references, Nicky provides links to scientific papers explaining these concepts in more detail. I highly suggest you check out her website here.

Screenshot of one of the explanations/simulations Nicky offers.

Hierarchical Linear Models 101

Multilevel models (also known as hierarchical linear models, nested data models, mixed models, random coefficient, random-effects models, random parameter models, or split-plot designs) are statistical models of parameters that vary at more than one level (Wikipedia). They are very useful in Social Sciences, where we are often interested in individuals that reside in nations, organizations, teams, or other higher-level units. Next to their individuals characteristics, the characteristics of these units they belong to may also have effects. To take into account effects from variables residing at multiple levels, we can use multilevel or hierarchical models.

Michael Freeman, a faculty member at the University of Washington Information School. made this amazing visual introduction to hierarchical modeling:

If you want to practice hierarchical modeling in R, I recommend the lesson by Page Paccini (first video) or the more elaborate video series by Statistics of DOOM (second):

Regular Expression Crosswords

A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. The regex equivalent is .*\.txt$.

Last week I posted a first tutorial on Regular Expressions in R and I am working its sequels. You may find additional resources on Regular Expressions in the learning overviews (R, Python, Data Science).

Today I came across this website of Regular Expression Crosswords, which proves a great resource to playfully master regular expression. All puzzles are validated live using the JavaScript regex engine. The figure below explains how it works

Via the links below you can jump puzzles that matches your expertise level:

New to R? Kickstart your learning and career with these 6 steps!

For newcomers, R code can look like old Egyptian hieroglyphs with its weird operators (%in%,<-,||, or %/%). The R language has been said to have a steep learning curve and although there are many introductory courses and books (see R Resources), it’s hard to decide where to start.

Fortunately, I am here to help! The below is a six-step guide on how to learning R, using only open access (i.e., free!) materials.

Although oriented at complete newcomers, it will have you writing your own practical scripts and programs in no time: just start at #1 and work your way to coding mastery!

If you already feel comfortable with the basics of R — or don’t like basics — you can start at #5 and jump into practical learning via the tidyverse.

Good luck!!!

Step 1: An R Folder (15 min)

Create a directory for your R learning stuff somewhere on your computer. Download this (very) short introduction to R by Paul Torfs and Claudia Bauer and store it in that folder. Now read the introduction and follow the steps. It will help you install all R software on your own computer and familiarize you with the standard data types.

Step 2: Handy Cheat Sheets (15 min)

Many standard functions exist in R and after a while you will remember them by heart. For now, it’s good to have a dictionary or references close by hand. Download and read the cheat sheets for base R (Mhairi McNeill) and R base functions (Tom Short). Because you’ll be writing most of your R scripts in RStudio, it’s also recommended to have an RStudio cheat sheet as well as an RStudio keyboard shortcuts cheat sheet by hand.

Step 3: `swirl` Away in RStudio (8h)

Now you’re ready to really start learning and we’re going to accelerate via swirl. Open up your RStudio and enter the two lines of code below in your console window.

install.packages('swirl') #download swirl package 
library(swirl) #load in swirl package

swirl (webpage) will automatically start and after a couple of prompts you will be able to choose the learning course called 1: R Programming: The basics of programming in R (see below). This course consists of 15 modules via which you will master the basics of R in the environment itself. Start with module 1 and complete between one to three modules per day, so that you finish the swirl course in a week.

swirl’s R 4 learning courses and the 15 modules belonging to the basics of R programming course

Step 4: A Pirate’s Guide to R (10h)

OK, you should now be familiar with the basics of R. However, knowledge is crystallized via repetition. I therefore suggest, you walk through the book YaRrr! The Pirate’s Guide to R (Phillips, 2017) starting in chapter 3. It’s a fun book and will provide you with more knowledge on how to program custom functions, loops, and some basic statistical modelling techniques – the thing R was actually designed for.

Step 5: R for Data Science (16h)

By now, you can say you might say you are an adapt R programmer with statistical modelling experience. However, you have been working with base R functions mostly, knowledge of which is a must-have to really understand the language. In practice, R programmers rely strongly on developed packages nevertheless. A very useful group of packages is commonly referred to as the tidyverse. You will be amazed at how much this set of packages simplifies working in R. The next step therefore, is to work through the book R for Data Science (Grolemund & Wickham, 2017) (hardcopy here).

Step 6: Specialize (∞)

You are now several steps and a couple of weeks further. You possess basic knowledge of the R language, know how to write scripts in RStudio, are capable of programming in base R as well as using the advanced functionality of the tidyverse, and you have even made a start with some basic statistical modelling.

It’s time to set you loose in the wonderful world of the R community. If you had not done this earlier, you should get accounts on Stack Overflow and Cross Validated. You might also want to subscribe to the R Help Mailing List, R Bloggers, and to my website obviously.

Join 385 other subscribers

On Twitter, have a look at #rstats and, on reddit, subscribe to the rstats, rstudio, and statistics threads. At this time, I can’t but advise you to return to the R Resources Overview and to continue broadening your R programming skills. Pick materials in the area that interests you:

If you want to become a hardcore programmer, this R programming course may better suit you and you will want to work your way through the books Advanced R (Wickham, 2014) and Efficient R Programming (Gillespie & Lovelace, 2017).

If you want to become a program developer, building functions and packages, you also want to consider mastering Software Development in R (Peng, Kross, & Anderson, 2017).

If you like visualization, look into the R Graph Gallery with code examples and read this practical introduction to ggplot2 (Healy, 2017) and the Hitchhiker’s Guide to ggplot2 in R (Burchell & Vargas, 2016).

If you like interactive visualizations, you will want to look at the above as well as R Shiny, the dashboarding resources, and the HTML Widgets that R offers.

If you want to become a data scientist, focus on machine learning via this course on statistical learning (Hastie & Tibshirani, 2014). If you prefer a shorter, practical introduction, try this Kaggle Competition Titanic walkthrough on Youtube.

If you like automation and reporting, start with the basics of markdown and regular expressions. Also consider reading the R Markdown Definitive Guide (Xie, Allaire, & Grolemund, 2018).
If you’re more interested in text analysis and text mining, knowledge of regular expressions is a must-have and a good additional start would be the book on Tidy Text Mining (Silges & Robinson, 2017).

SQL resources (free courses, books, & cheat sheets)

My list of SQL resources is still quite short so if you have additions, please comment below or contact me! There are separate overviews for Data Science, Machine Learning, & Statistics resources and for R resources and Python resources.