Tag: statisticallearning

StatQuest: Statistical concepts, clearly explained

StatQuest: Statistical concepts, clearly explained

Josh Starmer is assistant professor at the genetics department of the University of North Carolina at Chapel Hill.

But more importantly:
Josh is the mastermind behind StatQuest!

StatQuest is a Youtube channel (and website) dedicated to explaining complex statistical concepts — like data distributions, probability, or novel machine learning algorithms — in simple terms.

Once you watch one of Josh’s “Stat-Quests”, you immediately recognize the effort he put into this project. Using great visuals, a just-about-right pace, and relateable examples, Josh makes statistics accessible to everyone. For instance, take this series on logistic regression:

And do you really know what happens under the hood when you run a principal component analysis? After this video you will:

Or are you more interested in learning the fundamental concepts behind machine learning, then Josh has some videos for you, for instance on bias and variance or gradient descent:

With nearly 200 videos and counting, StatQuest is truly an amazing resource for students ‘and teachers on topics related to statistics and data analytics. For some of the concepts, Josh even posted videos running you through the analysis steps and results interpretation in the R language.

StatQuest started out as an attempt to explain statistics to my co-workers – who are all genetics researchers at UNC-Chapel Hill. They did these amazing experiments, but they didn’t always know what to do with the data they generated. That was my job. But I wanted them to understand that what I do isn’t magic – it’s actually quite simple. It only seems hard because it’s all wrapped up in confusing terminology and typically communicated using equations. I found that if I stripped away the terminology and communicated the concepts using pictures, it became easy to understand.

Over time I made more and more StatQuests and now it’s my passion on YouTube.

Josh Starmer via https://statquest.org/about/

Data Science, Machine Learning, & Statistics resources (free courses, books, tutorials, & cheat sheets)

Data Science, Machine Learning, & Statistics resources (free courses, books, tutorials, & cheat sheets)

Welcome to my repository of data science, machine learning, and statistics resources. Software-specific material has to a large extent been listed under their respective overviews: R Resources & Python Resources. I also host a list of SQL Resources and datasets to practice programming. If you have any additions, please comment or contact me!

LAST UPDATED: 21-05-2018




Sentiment Lexicons:



Must read: Computer Age Statistical Inference (Efron & Hastie, 2016)

Must read: Computer Age Statistical Inference (Efron & Hastie, 2016)

Statistics, and statistical inference in specific, are becoming an ever greater part of our daily lives. Models are trying to estimate anything from (future) consumer behaviour to optimal steering behaviours and we need these models to be as accurate as possible. Trevor Hastie is a great contributor to the development of the field, and I highly recommend the machine learning books and courses that he developed, together with Robert Tibshirani. These you may find in my list of R Resources (Cheatsheets, Tutorials, & Books).

Today I wanted to share another book Hastie wrote, together with Bradley Efron, another colleague of his at Stanford University. It is called Computer Age Statistical Inference (Efron & Hastie, 2016) and is a definite must read for every aspiring data scientist because it illustrates most algorithms commonly used in modern-day statistical inference. Many of these algorithms Hastie and his colleagues at Stanford developed themselves and the book handles among others:

  • Regression:
    • Logistic regression
    • Poisson regression
    • Ridge regression
    • Jackknife regression
    • Least angle regression
    • Lasso regression
    • Regression trees
  • Bootstrapping
  • Boosting
  • Cross-validation
  • Random forests
  • Survival analysis
  • Support vector machines
  • Kernel smoothing
  • Neural networks
  • Deep learning
  • Bayesian statistics



R resources (free courses, books, tutorials, & cheat sheets)

R resources (free courses, books, tutorials, & cheat sheets)

Help yourself to these free books, tutorials, packages, cheat sheets, and many more materials for R programming. There’s a separate overview for handy R programming tricks. If you have additions, please comment below or contact me!

Join 208 other followers

LAST UPDATED: 2020-02-16

Table of Contents (clickable)

Completely new to R? → Start learning here!

Introductory R

Introductory Books

Online Courses

Style Guides


Advanced R

Package Development

Non-standard Evaluation

Functional Programming


Cheat Sheets

Many of the above cheat sheets are hosted in the official RStudio cheat sheet overview.

Data Manipulation

Data Visualization


Interactive / HTML / JavaScript widgets


ggplot2 extensions


  • coefplot – visualizes model statistics
  • circlize – circular visualizations for categorical data
  • clustree – visualize clustering analysis
  • quantmod – candlestick financial charts
  • dabestr– Data Analysis using Bootstrap-Coupled ESTimation
  • devoutsvg – an SVG graphics device (with pattern fills)
  • devoutpdf – an PDF graphics device
  • cartography – create and integrate maps in your R workflow
  • colorspace – HSL based color palettes
  • viridis – Matplotlib viridis color pallete for R
  • munsell – Munsell color palettes for R
  • Cairo – high-quality display output
  • igraph – Network Analysis and Visualization
  • graphlayouts – new layout algorithms for network visualization
  • lattice – Trellis graphics
  • tmap – thematic maps
  • trelliscopejs – interactive alternative for facet_wrap
  • rgl – interactive 3D plots
  • corrplot – graphical display of a correlation matrix
  • googleVis – Google Charts API
  • plotROC – interactive ROC plots
  • extrafont – fonts in R graphics
  • rvg – produces Vector Graphics that allow further editing in PowerPoint or Excel
  • showtext – text using system fonts
  • animation – animated graphics using ImageMagick.
  • misc3d – 3d plots, isosurfaces, etc.
  • xkcd – xkcd style graphics
  • imager – CImg library to work with images
  • ungeviz – tools for visualize uncertainty
  • waffle – square pie charts a.k.a. waffle charts
  • Creating spectograms in R with hht, warbleR, soundgen, signal, seewave, or phonTools


Shiny, Dashboards, & Apps

Markdown & Other Output Formats

  • tidystats – automating updating of model statistics
  • papaja – preparing APA journal articles
  • blogdown – build websites with Markdown & Hugo
  • huxtable – create Excel, html, & LaTeX tables
  • xaringan – make slideshows via remark.js and markdown
  • summarytools – produces neat, quick data summary tables
  • citr – RStudio Addin to Insert Markdown Citations

Cloud, Server, & Database


Statistical Modeling & Machine Learning



Cheat sheets

Time series

Survival analysis



  • corrr – easier correlation matrix management and exploration


Natural Language Processing & Text Mining

Regular Expressions


Geographic & Spatial mapping

Bioinformatics & Computational Biology


Integrated Development Environments (IDEs) &
Graphical User Inferfaces (GUIs)

Descriptions mostly taken from their own websites:

  • RStudio*** – Open source and enterprise ready professional software
  • Jupyter Notebook*** – open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text across dozens of programming languages.
  • Microsoft R tools for Visual Studio – turn Visual Studio into a powerful R IDE
  • R Plugins for Vim, Emax, and Atom editors
  • Rattle*** – GUI for data mining
  • equisse – RStudio add-in to interactively explore and visualize data
  • R Analytic Flow – data flow diagram-based IDE
  • RKWard – easy to use and easily extensible IDE and GUI
  • Eclipse StatET – Eclipse-based IDE
  • OpenAnalytics Architect – Eclipse-based IDE
  • TinnR – open source GUI and IDE
  • DisplayR – cloud-based GUI
  • BlueSkyStatistics – GUI designed to look like SPSS and SAS 
  • ducer – GUI for everyone
  • R commander (Rcmdr) – easy and intuitive GUI
  • JGR – Java-based GUI for R
  • jamovi & jmv – free and open statistical software to bridge the gap between researcher and statistician
  • Exploratory.io – cloud-based data science focused GUI
  • Stagraph – GUI for ggplot2 that allows you to visualize and connect to databases and/or basic file types
  • ggraptr – GUI for visualization (Rapid And Pretty Things in R)
  • ML Studio – interactive Shiny platform for data visualization, statistical modeling and machine learning

R & other software and languages

R & Excel

R & Python


  • sqldf – running SQL statements on R data frames


Join 208 other followers

R Help, Connect, & Inspiration

R Blogs

R Jobs