Category: learning

What to consider when choosing colors for data visualization, by DataWrapper.de

Lisa Charlotte Rost of DataWrapper often writes about data visualization and lately she has focused on the (im)proper use of color in visualization. In this recent blog, she gives a bunch of great tips and best practices, some of which I copied below:

color in data vis advice — Gradient colors can be great to show a pattern but, for categorical data, it is often easier to highlight the most important values with colored bars, positions (like in a dot plot) or even areas. [https://blog.datawrapper.de/colors/]

You can find additional useful tips in the original DataWrapper blog.

Transitioning from Excel to R: Dictionary of common functions

Alyssa Columbus published maintains this GitHub repository with a great tutorial on how to move from Excel to R. Very useful for beginning useRs, the repository’s tutorial includes a translation table between Excel and R functions:

Excel Formula	R Function	Type
ABS	`abs`	Arithmetic
ADDRESS	`assign`	Essentials
AND	`&`,`&&`,`all`	Boolean
AVERAGE, AVG, AVERAGEIF	`mean`	Arithmetic
BETADIST	`pbeta`	Statistics
BETAINV	`qbeta`	Statistics
BINOMDIST	`pbinom` when cumulative,`dbinom` when not	Statistics
CEILING	`ceiling`	Arithmetic
CELL	`str` has the same idea	Essentials
CHIDIST, CHISQDIST	`pchisq`	Statistics
CHIINV, CHISQINV	`qchisq`	Statistics
CHITEST	`chisq.test`	Statistics
CHOOSE	`switch`	Essentials
CLEAN	`gsub`	Text
COLS, COLUMNS	`ncol`	Essentials
COLUMN	`col`,`:`,`seq`	Essentials
COMBIN	`choose`	Essentals
CONCATENATE	`paste`	Text
CONFIDENCE	`-qnorm(alpha/2)*std/sqrt(n)`	Statistics
CORREL	`cor`	Statistics
COUNT, COUNTIF	`length`	Arithmetic
COVAR	`cov`	Statistics
CRITBINOM	`qbinom`	Statistics
DELTA	`identical`	Boolean
EXACT	`==`	Boolean
EXP	`exp`	Arithmetic
EXPONDIST	`pexp` when cumulative,`dexp` when not	Statistics
FACT	`factorial`	Arithmetic
FACTDOUBLE	`dfactorial` in the `phangorn` package	Arithmetic
FDIST	`pf`	Statistics
FIND	`regexpr`,`grepl`,`grep`	Text
FINV	`qf`	Statistics
FISHER	`atanh`	Arithmetic
FISHERINV	`tanh`	Arithmetic
FIXED	`format`,`sprintf`,`formatC`	Essentials
FLOOR	`floor`	Arithmetic
FORECAST	`predict` on an `lm` object	Statistics
FREQUENCY	`cut`,`table`	Arithmetic
FTEST	`var.test`	Statistics
GAMMADIST	`pgamma` if last argument T,`dgamma` if last arg. F	Statistics
GAMMAINV	`qgamma`	Statistics
GAMMALN	`lgamma`	Statistics
GAUSS	`pnorm(x) - 0.5`	Statistics
GCD	`gcd`	Arithmetic
GEOMEAN	`exp(mean(log(x)))`	Arithmetic
GESTEP	`>=`	Boolean
HARMEAN	`harmonic.mean` in the `psych` package	Arithmetic
HLOOKUP	`match`,`merge`	Essentials
HYPGEOMDIST	`dhyper`	Statistics
IF	`if`,`ifelse`	Essentials
IFERROR	`try`,`tryCatch`	Essentials
INDEX	`x[y,z]`	Essentials
INDIRECT	`get`	Essentials
INT	`as.integer`(not for negative numbers),`floor`	Arithmetic
INTERCEPT	first element of `coef` of an `lm` object	Statistics
ISLOGICAL	`is.logical`	Boolean
ISNA	`is.na`	Boolean
ISNUMBER	`is.numeric`	Boolean
ISTEXT	`is.character`	Boolean
KURT	`kurtosis` in the `moments` package	Statistics
LARGE	`sort`	Statistics
LCM	`scm` in the `schoolmath` package	Arithmetic
LEFT	`substr`	Text
LEN, LENGTH	`nchar`	Text
LINEST	`lm`	Statistics
LN, LOG	`log`	Arithmetic
LOG10	`log10`	Arithmetic
LOGINV	`qlnorm`	Statistics
LOGNORMDIST	`plnorm`	Statistics
LOWER	`tolower`	Text
MATCH	`match`,`which`	Essentials
MAX	`max` (sometimes `pmax`)	Arithmetic
MDETERM	`det`	Arithmetic
MEDIAN	`median`	Arithmetic
MID	`substr`	Text
MIN	`min` (sometimes `pmin`)	Arithmetic
MINVERSE	`solve`	Arithmetic
MMULT	`%*%`	Arithmetic
MOD	`%%`	Arithmetic
MODE	`as.numeric(names(which.max(table(x))))`	Arithmetic
MUNIT	`diag`	Arithmetic
N	`as.numeric`	Arithmetic
NEGBINOMDIST	`dnbinom`	Statistics
NORMDIST, NORMSDIST	`pnorm` when cumulative,`dnorm` when not	Statistics
NORMINV, NORMSINV	`qnorm`	Statistics
NOT	`!`	Boolean
NOW	`date`,`Sys.time`	Essentials
OR	`	`,`
PEARSON	`cor`	Statistics
PERCENTILE	`quantile`	Statistics
PERCENTRANK	`ecdf`	Statistics
PERMUT	`function(n,k) {choose(n,k)*factorial(k)}`	Arithmetic
PERMUTATIONA	`n^k`	Arithmetic
PHI	`dnorm`	Statistics
POISSON	`ppois` when cumulatic,`dpois` when not	Statistics
POWER	`^`	Arithmetic
PROB	`ecdf`	Statistics
PRODUCT	`prod`	Arithmetic
PROPER	`toupper`	Text
QUARTILE	`quantile`	Arithmetic
QUOTIENT	`%/%`	Arithmetic
RAND	`runif`	Arithmetic
RANDBETWEEN	`sample`	Arithmetic
RANK	`rank`	Essentials
REPLACE	`sub`,`gsub`	Text
REPT	`rep` and `paste` or `paste0`	Text
RIGHT	`substring`	Text
ROUND	`round`	Arithmetic
ROUNDDOWN	`floor`	Arithmetic
ROUNDUP	`ceiling`	Arithmetic
ROW	`row`,`:`,`seq`	Essentials
ROWS	`nrow`	Essentials
RSQ	`summary` of `lm` object	Statistics
SEARCH	`regexpr`,`grep`	Text
SIGN	`sign`	Arithmetic
SKEW	`skewness` in the `moments` package	Statistics
SLOPE	in `coef` of `lm` object	Statistics
SMALL	`sort`	Arithmetic
SQRT	`sqrt`	Arithmetic
STANDARDIZE	`scale`	Statitics
STD, STDEV	`sd`	Arithmetic
STEYX	`predict` on an `lm` object	Statistics
STRING	`format`,`sprintf`,`formatC`	Text
SUBSTITUTE	`sub`,`gsub`,`paste`	Essentials
SUM, SUMIF	`sum`	Arithmetic
SUMPRODUCT	`crossprod`	Arithmetic
TDIST	`pt`	Statistics
TEXT	`format`,`sprintf`,`formatC`	Text
TINV	`abs(qt(x/2,data))`	Statistics
TODAY	`Sys.Date`	Essentials
TRANSPOSE	`t`	Arithmetic
TREND	`fitted` of an `lm` object	Statistics
TRIM	`sub`	Essentials
TRIMMEAN	`mean(x,trim=tr/2)`	Arithmetic
TRUNC	`trunc`	Essentials
TTEST	`t.test`	Statistics
TYPE	`typeof`,`mode`,`class`	Essentials
UPPER	`toupper`	Text
VALUE	`as.numeric`	Arithmetic
VAR	`var`	Essentials
VLOOKUP	`match`,`merge`	Essentials
WEEKDAY	`weekdays`	Essentials
WEIBULL	`pweibull` when cumulative,`dweibull` when not	Statistics
ZTEST	`pnorm`	Statistics

(Time Series) Forecasting: Principles & Practice (in R)

I stumbled across this open access book by Rob Hyndman, the god of time series, and George Athanasopoulos, a colleague statistician / econometrician at Monash University in Melbourne Australia.

Hyndman and Athanasopoulos provide a comprehensive introduction to forecasting methods, accessible and relevant among others for business professionals without any formal training in the area. All R examples in the book assume work build on the fpp2 R package. fpp2 includes all datasets referred to in the book and depends on other R packages including forecast and ggplot2.

Some examples of the analyses you can expect to recreate, ignore the agricultural topic for now ; )

Monthly milk production per cow. — One of the example analysis you will recreate by following the book (Figure 3.3)

Forecasts of egg prices using a random walk with drift applied to the logged data. — You will be forecasting price data using different analyses and adjustments (Figure 3.4)

I highly recommend this book to any professionals or students looking to learn more about forecasting and time series modelling. There is also a DataCamp course based on this book. If you got value out of this free book, be sure to buy a hardcopy as well.

R tips and tricks

Below are a dozen of very specific R tips and tricks. Some are valuable, useful, or boost your productivity. Others are just geeky funny.

More general helpful R packages and resources can be found in this list.

If you have additions, please comment below or contact me!

Completely new to R? → Start here!

RStudio tricks
General tips
Base R tricks
R Markdown tricks
Data manipulation tricks
Data visualization tricks

Funny tricks
Easter eggs

Join 385 other subscribers

RStudio

RStudio Addins
RStudio Keyboard Shortcuts
R Studio easy tricks: tearable panes, command history, renaming in scope, outlining, snippets, and more
Working with R projects and here
Working with code snippets
Working with code snippets (video)
Stop RStudio from asking to save workspace
Automatically save workspace in case of a crash / errors
Edit several lines of code at once
Press ALT + left mousebutton to select and write on multiple lines simultaneously.
Press ALT + - to insert a <- operator
Press CTRL + SHIFT + M to insert a %>% operator
Press CTRL + SHIFT + F to search all files in the directory or project
Press CTRL + UP to access navigate your console history
Rename all variables with same name (rename in scope)
Press CMD + ALT + SHIFT + M to rename variable within scope: to rename all/multiple occurrences of a variable in a script
Press TAB inside “” (quotation marks / an empty string) to select a filename from your current directory, or to autocomplete a filename you started typing

Many more shortkeys available here online, and in your RStudio under Tools → Keyboard Shortcuts Help.

General

Disclaimer: This page contains one or more links to Amazon.
Any purchases made through those links provide us with a small commission that helps to host this blog.

Useful base functions

str() – explore structure of R object
trimws() – trim trailing and/or leading whitespaces
dput() – dump an R object in form of R code
cut()– categorize values into intervals
intersect() – returns similar elements in two vectors
union() – find intersecting items in two vectors
setdiff() – returns different elements in two vectors
interaction() – computes a factor which represents the interaction of the given factors
formatC()can be used to round numbers and force trailing zero’s
formatC() and sprintf() can be used to add leading/trailing characters
expand.grid() – create a data frame from all combinations of the supplied vectors or factors
seq_along(myvec) – generates a vector of 1:length(myvec)
Initiate an empty dataframe with header names
Functional programming tricks:
- switch() can replace elaborate ifelse statements (see also)
- match.arg() can check for arguments and values
- The null-default operator (%||%) returns the first value that is not NULL
Convert a vector of strings to title case
Quickly map a new set of values to an existing vector
Calculate the derivative of a function expression
Specify options() in your script:
- Prevent scientific notation using options(scipen = 999)
- Prevent automatic factor columns using options(stringsAsFactors = FALSE)
- Use options(width = 60) to change the default width of console output
- Use options(max.print = 100) to change the default number of values printed in the console

Back to Table of Contents

R Markdown

Pimp my RMD: Overview of many R markdown tricks by Yan Holtz
Save compiled images in folder with markdown
Add caption to compiled tables with markdown
Tabsets in markdown
Foldable html content in markdown
Reuse code chunks in markdown
Generate Word documents with markdown
Open url’s in a new window with[text](url){target = "_blank} in markdown
Use #<< to highlight code
Move to next xaringan slide upon click (or Enter)
Convert an R Markdown file (.Rmd) into an R script (.R) with
knitr::purl(input, output, documentation = 2)
Use CTRL + SHIFT + 1:4 to zoom in on any single of your RStudio panels. Use ALT + CTRL + SHIFT + 0 to zoom back out.
knitr::read_chunk("your_script_name.R") can be used to source in scripts that reside outside your current markdown file
Use animations in your markdown files with the gganimate package and "header-includes: - \usepackage{animate} in your YAML preamble
Create a searchable, sortable HTML table in 1 line of code with DT::datatable(mydf, filter = 'top')

Data manipulation

readr::parse_number extracts the numbers from raw / scraped text
stringr::str_pad can be used to add leading or trailing characters (like zero’s)
dplyr tricks
dplyr::case_when replaces elaborate ifelse statements (Video)
dplyr::everything in combination with dplyr::select to reorder columns
Quickly count / tally observations within groups with dplyr::count, dplyr::tally, and dplyr::add_count and dplyr::add_tally
Quickly filter the top categories / groups based on a variable with forcats::fct_lump
Apply the same filter to multiple columns with dplyr::filter_all or dplyr::filter_if in combination with dplyr::all_vars and dplyr::any_vars
dplyr::group_by_if performs quick conditional grouping
Perform rowwise mutations / calculations using dplyr::rowwise
purrr tricks
purrr::map_df to read in and merge all data files in a folder
Combine purr::map_df and fs::dir_ls to read in and merge all data files following a specific pattern in a folder
Combine list.files and purrr::map_df to read in and merge all data files in a folder
broom::tidy puts your model results in a tidy data frame
Simpler correlation analysis with corrr
df %>% .$column_name or df %$% column_name can retrieve a column from a tibble
dplyr::coalesce finds the one value contained in many columns with missing values
Display a fraction between 0 and 1 as a percentage with scales::percent(myfraction)
Convert numbers that came in as strings with commas to R numbers with readr::parse_number(mydf$mycol)

Data visualization

colors() to see the names of all built-in colors
GGally::ggpairs for beautiful pair-wise correlation plots
tidyr::complete to get barplot spacing right
Quickly visualize your whole dataset
Create custom, corporate, reproducible color palettes and custom discrete color scales
Standardize the colors of groups in your visualizations using named vectors
theme_set to set a default ggplot2 theme
Create your own ggplot2 theme:
Rearranging values and axis within ggplot2 facets
Add line labels at the end of geom_lines by Simon Jackson
Add + NULL to the end of your ggplot2 chain during development
Add clip = "off" to draw outside the plot panel
Remove point borders with stroke = 0
Multicolored annotated text in ggplot2 by Andrew Whitby & Visuelle Data
Combine plots using patchwork or cowplot
Add a (corporate) logo to your plot using magick
Use animations in your markdown files with the gganimate package and "header-includes: - \usepackage{animate} in your YAML preamble
If you pass a function to the data-argument in a geom_*, then it applies that function to the data!
Generate distributions in ggplot2 using the stat_function function. Normal distributions, student t-distributions, beta distributions, anything. See also here.

Back to Table of Contents

Fun

Easter eggs

Run ????"", via Reddit
Run example(readLine), via DecisionStats
Run ?.Internal, via DecisionStats

Join 385 other subscribers

Back to Table of Contents

Interactive Explanation of Network and Graph Principles

Why do groups of people act smart, dumb, kind, or cruel? People behave in strange ways, particularly when they are able to influence one another. Both good and bad things can happen when people interact and behave in network structures. On the bright side, you must be familiar with the wisdom of the crowd, where the aggregated knowledge of a group is more valuable than its sum? Ensemble algorithms – like random forest analysis – rely on this positive principle.

On the dark side, are you familiar with the phenomenon called the tragedy of the commons, where shared resource-systems collapse because individuals behave in their self-interest? Or psychological phenomena such as groupthink, where groups of people make irrational decisions due to social issues? The recent spread of fake news and misinformation is also stimulated by network interactions. In these cases, we could speak of the madness of the crowd.

Nicky Case made a great interactive walkthrough explaining why and when networks of people become wise or mad. You are tasked to change and simulate network interactions while Nicky explains concepts such as (complex) contagion, the majority illusion paradox, bonding and bridging, and small world networks. In the references, Nicky provides links to scientific papers explaining these concepts in more detail. I highly suggest you check out her website here.

Screenshot of one of the explanations/simulations Nicky offers.

Predictive HR Analytics

Tilburg University has set up a masterclass Predictive HR Analytics. In 3 days, the Professional Learning program will teach you all you need to know to implement predictive analytics and take HR to the next level. More information can be found here.

What makes this program unique?

The masterclass Predictive HR Analytics goes beyond HR analytics and focuses on transformational people predictions. You learn how to embed predictive HR analytics into your HR Strategy and how to use your findings to convince others.
The masterclass is developed at the prestigious Human Resources department at Tilburg University, which has obtained international recognition with its high-quality academic research in the HRM field.
The mix of professors in conjunction with leading HR professionals leads to a strong academic program with a practical approach.
Your peer participants will make sure that the class opens up a high-quality network of HR specialists. The diversity of leading companies from different sectors in the classroom creates new insights for all the participants.
The program is like a 3-day pressure cooker. By combining online and offline components, we can create more in-depth discussions in the classroom.
You will experience a high impact on your daily practice, since the program is focused on direct implementation.

Your profile

This course is ideal for anyone in HR seeking to become more adept in using quantitative data for decision making. Typical participants are (future) HR analysts, HR managers, HR business partners, HR consultants and (financial) business analysts with a strong link on people resources. Participants are from various sectors, such as financial services, healthcare institutions, government agencies and business services.