In a recent post, I provided an example of how Hadley Wickham‘s tidyverse has improved the workflow of analysts, makes coding errors less likely, and the code more transparent. In this current post, I provide a more general overview of the tidyverse, its packages, and how they work.

The figure below represents a simplified project workflow of the average data science project. As a first step, the analyst will need to import (load) the data to his/her working environment (e.g., Excel, SPSS, R, RStudio, Spyder, Jupyter). In order to guarantee that the data are correct, a next step will be to clean up and tidy the data before continuing to the analysis part. In this early stage, the analyst can handle the explicit errors in the dataset, such as missing and nonsensical data points or records. After these preparatory steps, the main process starts. This consists of three interrelated tasks. (1) The analyst will need to transform the data in order to retrieve statistics, descriptives, and/or new features. (2) The analyst will need to visualize statistics, relations, and results. This is essential for storytelling and effective interpretation and communication of the results. (3) The analyst will try out different models to fit, explain, and predict the data. Finally, the results of this main process (leading to “understanding” of the data and the underlying processing) can be communicated to others.

A simplified, standard cycle of data analysis

The tidyverse provides assistance in each of the stages. Various packages provide functionality to perform analytical tasks more effectively, in fewer lines, with fewer errors, and moreover in more transparent code.

 

tidyverse packages and their place in the workflow

 

I run through each of these stages in separate posts, explaining the various packages, their inner workings, and demonstrating how they affect the process of data analysis in R. The links for each of these posts, you can find below:

  • Importing data (work in progress)
  • Tidying data (work in progress)
  • Transforming data (work in progress)
  • Visualizing data (work in progress)
  • Modeling data (work in progress)
  • Efficient programming (work in progress)

3 thoughts on “tidyverse 101: Simplifying life for useRs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s