Tag: learning

Tensorflow for R Gallery

Tensorflow for R Gallery

Tensorflow is a open-source machine learning (ML) framework. It’s primarily used to build neural networks, and thus very often used to conduct so-called deep learning through multi-layered neural nets. 

Although there are other ML frameworks — such as Caffe or Torch — Tensorflow is particularly famous because it was developed by researchers of Google’s Brain Lab. There are widespread debates on which framework is best, nonetheless, Tensorflow does a pretty good job on marketing itself. 

Google search engine searches on Tensorflow in comparison to searches on Machine learing and Deep learning

I primarily work in the programming language R, and have written before about how to start with deep learning in R using Keras — an user-friendly API built on top of, among others, Tensorflow. Now, it has become even easier to learn how to implement the power of Tensorflow in R, for RStudio has compiled a gallery of featured posts on Tensorflow implementations in R. It features a variety of applications related to collaborative filtering, image recognition, audio classification, times series forecasting, and fraud detection, all using Keras and TensorFlow. I highly recommend you check it out if you want to learn more about deep learning in R. 

Checklist to Optimize Training Transfer in Organizations

Checklist to Optimize Training Transfer in Organizations

Ashley Hughes, Stephanie Zajac, Jacqueline Spencer, and Eduardo Salas wrote a recent research note for the International Journal of Training and Development. The research note is build around an evidence-based checklist of actionable insights for practitioners that will help to enhance the effectiveness of training interventions. These actionable insights would help to prevent ‘transfer problem’, meaning that trained skills are not being used on the job. 


Screenshot of the first page of the published research note, containing the abstract

Unfortunately, these published academic papers are often behind a paywall, but you may request a PDF from the authors here on ResearchGate.

Screenshot of the appendix of the research note containing the checklist for practitioners.

For the full details and scientific evidence behind each suggested action, I suggest you access the research note. Nevertheless, here’s my summary of their main advice on improving training transfer before, during, and after training implementation:

Before training

  • Conduct a training needs analysis to align the training’s content and participants with the organizational objectives
  • Involved stakeholders should be aware of training, understand its importance, and — obviously — be prepared for the training program. The scholars provide seven specific actions here, including the setting of personal training goals, and aligning resources and rewards with the training.
  • Training attendance should be framed as an opportunity, and the training’s anticipated benefits could be emphasized (e.g. improvement of work processes or on-the-job performance).
  • A climate which encourages learning should be created, with dedicated time (and opportunities) for post‐training learning 
    and a sense of accountability for using trained knowledge, skills, and abilities.

During training

  • Piloting the training with a single department or subset of trainees is highly encouraged. This is one way that greatly helps to assess whether the training design is appropriate in terms of content and delivery.
  • Error‐encouragement framing can influence a trainee’s learning orientation and thus errors made during training should be framed as growth opportunities.

After training

  • Use of the trained skills should be supported and planned. For instance, participants could be given a small workload reduction to provide opportunities to apply the learned knowledge and skills once they return to their position. 
  • Management and training participants should be held accountable for their use of skills on the job.
  • Think about using just‐in‐time or refresher training and coaching, if needed.
  • Assess training effectiveness criteria including training transfer using metrics and analytics. Specifically, the scholars propose that the criteria measured in the training evaluation should correspond to the training needs identified through the training needs analysis that was conducted before the training. 
  • Training evaluation criteria should consider the scope and timeframe of the training. Take into account that distal outcomes such as ROI may take longer to realize. 

7 tips for writing cleaner JavaScript code, translated to 3.5 tips for R programming

7 tips for writing cleaner JavaScript code, translated to 3.5 tips for R programming

I recently came across this lovely article where Ali Spittel provides 7 tips for writing cleaner JavaScript code. Enthusiastic about her guidelines, I wanted to translate them to the R programming environment. However, since R is not an object-oriented programming language, not all tips were equally relevant in my opinion. Here’s what really stood out for me.

Capture.PNG
Ali Spittel’s Javascript tips, via https://dev.to/aspittel/extreme-makeover-code-edition-k5k

1. Use clear variable and function names

Suppose we want to create our own custom function to derive the average value of a vector v (please note that there is a base::mean function to do this much more efficiently). We could use the R code below to compute that the average of vector 1 through 10 is 5.5.

avg <- function(v){
    s = 0
    for(i in seq_along(v)) {
        s = s + v[i]
    }
    return(s / length(v))
}

avg(1:10) # 5.5

However, Ali rightfully argues that this code can be improved by making the variable and function names much more explicit. For instance, the refigured code below makes much more sense on a first look, while doing exactly the same.

averageVector <- function(vector){
    sum = 0
    for(i in seq_along(vector)){
        sum = sum + vector[i]
    }
    return(sum / length(vector))
}

averageVector(1:10) #5.5

Of course, you don’t want to make variable and function names unnecessary long (e.g., average would have been a great alternative function name, whereas computeAverageOfThisVector is probably too long). I like Ali’s principle:

Don’t minify your own code; use full variable names that the next developer can understand.

2. Write short functions that only do one thing

Ali argues “Functions are more understandable, readable, and maintainable if they do one thing only. If we have a bug when we write short functions, it is usually easier to find the source of that bug. Also, our code will be more reusable.” It thus helps to break up your code into custom functions that all do one thing and do that thing good!

For instance, our earlier function averageVector actually did two things. It first summated the vector, and then took the average. We can split this into two seperate functions in order to standardize our operations.

sumVector <- function(vector){
    sum = 0
    for(i in seq_along(vector)){
        sum = sum + vector[i]
    }
    return(sum)
}

averageVector <- function(vector){
    sum = sumVector(vector)
    average = sum / length(vector)
    return(average)
}

sumVector(1:10) # 55
averageVector(1:10) # 5.5

If you are writing a function that could be named with an “and” in it — it really should be two functions.

3. Documentation

Personally, I am terrible in commenting and documenting my work. I am always too much in a hurry, I tell myself. However, no more excuses! Anybody should make sure to write good documentation for their code so that future developers, including future you, understand what your code is doing and why!

Ali uses the following great example, of a piece of code with magic numbers in it.

areaOfCircle <- function(radius) {
  return(3.14 * radius ** 2)
}

Now, you might immediately recognize the number Pi in this return statement, but others may not. And maybe you will need the value Pi somewhere else in your script as well, but you accidentally use three decimals the next time. Best to standardize and comment!

PI <- 3.14 # PI rounded to two decimal places

areaOfCircle <- function(radius) {
  # Implements the mathematical equation for the area of a circle:
  # Pi times the radius of the circle squared.
  return(PI * radius ** 2)
}

The above is much clearer. And by making PI a variable, you make sure that you use the same value in other places in your script! Unfortunately, R doesn’t handle constants (unchangeable variables), but I try to denote my constants by using ALL CAPITAL variable names such as PI, MAX_GROUP_SIZE, or COLOR_EXPERIMENTAL_GROUP.

Do note that R has a built in variable pi for purposes such as the above.

I love Ali’s general rule that:

Your comments should describe the “why” of your code.

However, more elaborate R programming commenting guidelines are given in the Google R coding guide, stating that:

Functions should contain a comments section immediately below the function definition line. These comments should consist of a one-sentence description of the function; a list of the function’s arguments, denoted by Args:, with a description of each (including the data type); and a description of the return value, denoted by Returns:. The comments should be descriptive enough that a caller can use the function without reading any of the function’s code.

Either way, prevent that your comments only denote “what” your code does:

# EXAMPLE OF BAD COMMENTING ####

PI <- 3.14 # PI

areaOfCircle <- function(radius) {
    # custom function for area of circle
    return(PI * radius ** 2) # radius squared times PI
}

5. Be Consistent

I do not have as strong a sentiment about consistency as Ali does in her article, but I do agree that it’s nice if code is at least somewhat in line with the common style guides. For R, I like to refer to my R resources list which includes several common style guides, such as Google’s or Hadley Wickham’s Advanced R style guide.

 

How to Design Your First Programs

How to Design Your First Programs

Past week, I started this great C++ tutorial: learncpp.com. It has been an amazing learning experience so far, mostly because the tutorial is very hands on, allowing you to immediately self-program all of the code examples.

Several hours in now, section 1.10b explains how to design of your own, first programs. The advice in this seciton seemd pretty universal, thus valuable regardless of the programming language you normally work in. At least, I found it to resonates with my personal experiences so I highly recommend that you take 10 minutes to read it yourself: www.learncpp.com/cpp-tutorial/1-10b-how-to-design-your-first-programs. For those who dislike detailed insights, here are the main pointers:

A little up-front planning saves time and frustration in the long run. Generally speaking, work through these eight steps when starting a new program or project:

  1. Define the problem
  2. Collect the program’s basic requirements (e.g., functionality, constraints)
  3. Define your tools, targets, and backup plan
  4. Break hard problems down into easy problems
  5. Figure out (and list) the sequence of events
  6. Figure out the data inputs and outputs for each task
  7. Write the task details
  8. Connect the data inputs and outputs

Some general words of advice when writing programs:

Keep your programs simple to start. Often new programmers have a grand vision for all the things they want their program to do. “I want to write a role-playing game with graphics and sound and random monsters and dungeons, with a town you can visit to sell the items that you find in the dungeon” If you try to write something too complex to start, you will become overwhelmed and discouraged at your lack of progress. Instead, make your first goal as simple as possible, something that is definitely within your reach. For example, “I want to be able to display a 2d field on the screen”.

Add features over time. Once you have your simple program working and working well, then you can add features to it. For example, once you can display your 2d field, add a character who can walk around. Once you can walk around, add walls that can impede your progress. Once you have walls, build a simple town out of them. Once you have a town, add merchants. By adding each feature incrementally your program will get progressively more complex without overwhelming you in the process.

Focus on one area at a time. Don’t try to code everything at once, and don’t divide your attention across multiple tasks. Focus on one task at a time, and see it through to completion as much as is possible. It is much better to have one fully working task and five that haven’t been started yet than six partially-working tasks. If you split your attention, you are more likely to make mistakes and forget important details.

Test each piece of code as you go. New programmers will often write the entire program in one pass. Then when they compile it for the first time, the compiler reports hundreds of errors. This can not only be intimidating, if your code doesn’t work, it may be hard to figure out why. Instead, write a piece of code, and then compile and test it immediately. If it doesn’t work, you’ll know exactly where the problem is, and it will be easy to fix. Once you are sure that the code works, move to the next piece and repeat. It may take longer to finish writing your code, but when you are done the whole thing should work, and you won’t have to spend twice as long trying to figure out why it doesn’t.

Learn C++; Section 1.10b

ggstatsplot: Creating graphics including statistical details

ggstatsplot: Creating graphics including statistical details

This pearl had been resting in my inbox for quite a while before I was able to add it to my R resources list. Citing its GitHub pageggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the plots themselves and targeted primarily at behavioral sciences community to provide a one-line code to produce information-rich plots. The package is currently maintained and still under development by Indrajeet Patil. Nevertheless, its functionality is already quite impressive. You can download the latest stable version via:

utils::install.packages(pkgs = "ggstatsplot")

Or download the development version via:

devtools::install_github(
  repo = "IndrajeetPatil/ggstatsplot", # package path on GitHub
  dependencies = TRUE,                 # installs packages which ggstatsplot depends on
  upgrade_dependencies = TRUE          # updates any out of date dependencies
)

The package currently supports many different statistical plots, including:

?ggbetweenstats
?ggscatterstats
?gghistostats
?ggpiestats
?ggcorrmat
?ggcoefstats
?combine_plots
?grouped_ggbetweenstats
?grouped_ggscatterstats
?grouped_gghistostats
?grouped_ggpiestats
?grouped_ggcorrmat

Let’s take a closer look at the first one:

ggbetweenstats

This function creates either a violin plot, a box plot, or a mix of two for between-group or between-condition comparisons and additional detailed results from statistical tests can be added in the subtitle. The simplest function call looks like the below, but much more complex information can be added and specified.

set.seed(123) # to get reproducible results

# the functions work approximately the same as ggplot2
ggstatsplot::ggbetweenstats(
  data = datasets::iris, 
  x = Species, 
  y = Sepal.Length,
  messages = FALSE
) +   
# and can be adjusted using the same, orginal function calls
  ggplot2::coord_cartesian(ylim = c(3, 8)) + 
  ggplot2::scale_y_continuous(breaks = seq(3, 8, by = 1))

All pictures copied from the GitHub page of ggstatsplot [original]

ggscatterstats

Not all plots are ggplot2-compatible though, for instance, ggscatterstats is not. Nevertheless, it produces a very powerful plot in my opinion.

ggstatsplot::ggscatterstats(
  data = datasets::iris, 
  x = Sepal.Length, 
  y = Petal.Length,
  title = "Dataset: Iris flower data set",
  messages = FALSE
)

All pictures copied from the GitHub page of ggstatsplot [original]

ggcormat

ggcorrmat is also quite impressive, producing correlalograms with only minimal amounts of code as it wraps around ggcorplot. The defaults already produces publication-ready correlation matrices:

ggstatsplot::ggcorrmat(
  data = datasets::iris,
  corr.method = "spearman",
  sig.level = 0.005,
  cor.vars = Sepal.Length:Petal.Width,
  cor.vars.names = c("Sepal Length", "Sepal Width", "Petal Length", "Petal Width"),
  title = "Correlalogram for length measures for Iris species",
  subtitle = "Iris dataset by Anderson",
  caption = expression(
    paste(
      italic("Note"),
      ": X denotes correlation non-significant at ",
      italic("p "),
      "< 0.005; adjusted alpha"
    )
  )
)

All pictures copied from the GitHub page of ggstatsplot [original]

ggcoefstats

Finally, ggcoefstats is a wrapper around GGally::ggcoef, creating a plot with the regression coefficients’ point estimates as dots with confidence interval whiskers. Here’s an example with some detailed specifications:

ggstatsplot::ggcoefstats(
  x = stats::lm(formula = mpg ~ am * cyl,
                data = datasets::mtcars),
  point.color = "red",
  vline.color = "#CC79A7",
  vline.linetype = "dotdash",
  stats.label.size = 3.5,
  stats.label.color = c("#0072B2", "#D55E00", "darkgreen"),
  title = "Car performance predicted by transmission and cylinder count",
  subtitle = "Source: 1974 Motor Trend US magazine"
) +                                    
  ggplot2::scale_y_discrete(labels = c("transmission", "cylinders", "interaction")) +
  ggplot2::labs(x = "regression coefficient",
                y = NULL)

All pictures copied from the GitHub page of ggstatsplot [original]
I for one am very curious to see how Indrajeet will further develop this package, and whether academics will start using it as a default in publishing.

 

Transitioning from Excel to R: Dictionary of common functions

Transitioning from Excel to R: Dictionary of common functions

Alyssa Columbus published maintains this GitHub repository with a great tutorial on how to move from Excel to R. Very useful for beginning useRs, the repository’s tutorial includes a translation table between Excel and R functions:

Excel Formula R Function Type
ABS abs Arithmetic
ADDRESS assign Essentials
AND &,&&,all Boolean
AVERAGE, AVG, AVERAGEIF mean Arithmetic
BETADIST pbeta Statistics
BETAINV qbeta Statistics
BINOMDIST pbinom when cumulative,dbinom when not Statistics
CEILING ceiling Arithmetic
CELL str has the same idea Essentials
CHIDIST, CHISQDIST pchisq Statistics
CHIINV, CHISQINV qchisq Statistics
CHITEST chisq.test Statistics
CHOOSE switch Essentials
CLEAN gsub Text
COLS, COLUMNS ncol Essentials
COLUMN col,:,seq Essentials
COMBIN choose Essentals
CONCATENATE paste Text
CONFIDENCE -qnorm(alpha/2)*std/sqrt(n) Statistics
CORREL cor Statistics
COUNT, COUNTIF length Arithmetic
COVAR cov Statistics
CRITBINOM qbinom Statistics
DELTA identical Boolean
EXACT == Boolean
EXP exp Arithmetic
EXPONDIST pexp when cumulative,dexp when not Statistics
FACT factorial Arithmetic
FACTDOUBLE dfactorial in the phangorn package Arithmetic
FDIST pf Statistics
FIND regexpr,grepl,grep Text
FINV qf Statistics
FISHER atanh Arithmetic
FISHERINV tanh Arithmetic
FIXED format,sprintf,formatC Essentials
FLOOR floor Arithmetic
FORECAST predict on an lm object Statistics
FREQUENCY cut,table Arithmetic
FTEST var.test Statistics
GAMMADIST pgamma if last argument T,dgamma if last arg. F Statistics
GAMMAINV qgamma Statistics
GAMMALN lgamma Statistics
GAUSS pnorm(x) - 0.5 Statistics
GCD gcd Arithmetic
GEOMEAN exp(mean(log(x))) Arithmetic
GESTEP >= Boolean
HARMEAN harmonic.mean in the psych package Arithmetic
HLOOKUP match,merge Essentials
HYPGEOMDIST dhyper Statistics
IF if,ifelse Essentials
IFERROR try,tryCatch Essentials
INDEX x[y,z] Essentials
INDIRECT get Essentials
INT as.integer(not for negative numbers),floor Arithmetic
INTERCEPT first element of coef of an lm object Statistics
ISLOGICAL is.logical Boolean
ISNA is.na Boolean
ISNUMBER is.numeric Boolean
ISTEXT is.character Boolean
KURT kurtosis in the moments package Statistics
LARGE sort Statistics
LCM scm in the schoolmath package Arithmetic
LEFT substr Text
LEN, LENGTH nchar Text
LINEST lm Statistics
LN, LOG log Arithmetic
LOG10 log10 Arithmetic
LOGINV qlnorm Statistics
LOGNORMDIST plnorm Statistics
LOWER tolower Text
MATCH match,which Essentials
MAX max (sometimes pmax) Arithmetic
MDETERM det Arithmetic
MEDIAN median Arithmetic
MID substr Text
MIN min (sometimes pmin) Arithmetic
MINVERSE solve Arithmetic
MMULT %*% Arithmetic
MOD %% Arithmetic
MODE as.numeric(names(which.max(table(x)))) Arithmetic
MUNIT diag Arithmetic
N as.numeric Arithmetic
NEGBINOMDIST dnbinom Statistics
NORMDIST, NORMSDIST pnorm when cumulative,dnorm when not Statistics
NORMINV, NORMSINV qnorm Statistics
NOT ! Boolean
NOW date,Sys.time Essentials
OR ` ,
PEARSON cor Statistics
PERCENTILE quantile Statistics
PERCENTRANK ecdf Statistics
PERMUT function(n,k) {choose(n,k)*factorial(k)} Arithmetic
PERMUTATIONA n^k Arithmetic
PHI dnorm Statistics
POISSON ppois when cumulatic,dpois when not Statistics
POWER ^ Arithmetic
PROB ecdf Statistics
PRODUCT prod Arithmetic
PROPER toupper Text
QUARTILE quantile Arithmetic
QUOTIENT %/% Arithmetic
RAND runif Arithmetic
RANDBETWEEN sample Arithmetic
RANK rank Essentials
REPLACE sub,gsub Text
REPT rep and paste or paste0 Text
RIGHT substring Text
ROUND round Arithmetic
ROUNDDOWN floor Arithmetic
ROUNDUP ceiling Arithmetic
ROW row,:,seq Essentials
ROWS nrow Essentials
RSQ summary of lm object Statistics
SEARCH regexpr,grep Text
SIGN sign Arithmetic
SKEW skewness in the moments package Statistics
SLOPE in coef of lm object Statistics
SMALL sort Arithmetic
SQRT sqrt Arithmetic
STANDARDIZE scale Statitics
STD, STDEV sd Arithmetic
STEYX predict on an lm object Statistics
STRING format,sprintf,formatC Text
SUBSTITUTE sub,gsub,paste Essentials
SUM, SUMIF sum Arithmetic
SUMPRODUCT crossprod Arithmetic
TDIST pt Statistics
TEXT format,sprintf,formatC Text
TINV abs(qt(x/2,data)) Statistics
TODAY Sys.Date Essentials
TRANSPOSE t Arithmetic
TREND fitted of an lm object Statistics
TRIM sub Essentials
TRIMMEAN mean(x,trim=tr/2) Arithmetic
TRUNC trunc Essentials
TTEST t.test Statistics
TYPE typeof,mode,class Essentials
UPPER toupper Text
VALUE as.numeric Arithmetic
VAR var Essentials
VLOOKUP match,merge Essentials
WEEKDAY weekdays Essentials
WEIBULL pweibull when cumulative,dweibull when not Statistics
ZTEST pnorm Statistics