Author: Paul van der Laken

Awesome R Shiny Resources & Extensions

Awesome R Shiny Resources & Extensions

Rob Gilmore curates a github repo listing resources for working with Shiny, the R web framework and dashboarding tool.

Nan Xiao curates a second repository, listing awesome R packages offer that extensions to Shiny, like extended UI or server components.

They should be your go-to resources when looking for anything Shiny!

Shiny Resources

Extensions

Become a Data Science Professional

Become a Data Science Professional

Amit Ness gathered an impressive list of learning resources for becoming a data scientist.

It’s great to see that he shares them publicly on his github so that others may follow along.

But beware, this learning guideline covers a multi-year process.

Amit’s personal motto seems to be “Becoming better at data science every day“.

Completing the hyperlinked list below will take you several hundreds days at the least!

Learning Philosophy:

Index

People Analytics vs. HR Analytics Google trends

People Analytics vs. HR Analytics Google trends

A few years back I completed my dissertation on data-driven Human Resource Management.

This specialized field is often dubbed HR analytics, for basically it’s the application of analytics to the topic of human resources.

Yet, as always in a specialized and hyped field, diifferent names started to emerge. The term People analytics arose, as did Workforce analytics, Talent analytics, and many others.

I addressed this topic in the introduction to my Ph.D. thesis and because I love data visualization, I decided to make a visual to go along with it.

So I gathered some Google Trends data, added a nice locally smoothed curve through it, and there you have it. As the original visual was so well received that it was even cited in this great handbook on HR analytics. With almost three years passed now, I decided it was time for an update. So here’s the 2021 version.

If you would compare this to the previous version, the trends look quite different. In the previous version, People Analytics had the dominant term since 2011 already.

Unfortunately, that’s not something I can help. Google indexes these search interest ratings behind the scenes, and every year or so, they change how they are calculated.

If you want to get such data yourself, have a look at the Google Trends project.


In my dissertation, I wrote the following on the topic:

This process of internally examining the impact of HRM activities goes by many different labels. Contemporary popular labels include people analytics (e.g., Green, 2017; Kane, 2015), HR analytics (e.g., Lawler, Levenson, & Boudreau, 2004; Levenson, 2005; Rasmussen & Ulrich, 2015; Paauwe & Farndale, 2017), workforce analytics (e.g., Carlson & Kavanagh, 2018; Hota & Ghosh, 2013; Simón & Ferreiro, 2017), talent analytics (e.g., Bersin, 2012; Davenport, Harris, & Shapiro, 2010), and human capital analytics (e.g.,
Andersen, 2017; Minbaeva, 2017a, 2017b; Levenson & Fink, 2017; Schiemann, Seibert, & Blankenship, 2017). Other variations including metrics or reporting are also common (Falletta, 2014) but there is consensus that these differ from the analytics-labels (Cascio & Boudreau, 2010; Lawler, Levenson, & Boudreau, 2004). While HR metrics would refer to descriptive statistics on a single construct, analytics involves exploring and quantifying relationships between multiple constructs.

Yet, even within analytics, a large variety of labels is used interchangeably. For instance, the label people analytics is favored in most countries globally, except for mainland Europe and India where HR analytics is used most (Google Trends, 2018). While human capital analytics seems to refer to the exact same concept, it is used almost exclusively in scientific discourse. Some argue that the lack of clear terminology is because
of the emerging nature of the field (Marler & Boudreau, 2017). Others argue that differences beyond semantics exist, for instance, in terms of the accountabilities the labels suggest, and the connotations they invoke (Van den Heuvel & Bondarouk, 2017). In practice, HR, human capital, and people analytics are frequently used to refer to analytical projects covering the entire range of HRM themes whereas workforce and talent analytics are commonly used with more narrow scopes in mind: respectively (strategic) workforce planning initiatives and analytical projects in recruitment, selection, and development. Throughout this dissertation, I will stick to the label people analytics, as this is leading label globally, and in the US tech companies, and thus the most likely label to which I
expect the general field to converge.

publicatie-online.nl/uploaded/flipbook/15810-v-d-laken/12/

Want to learn more about people analytics? Have a look at this reading list I compiled.

How a File Format Exposed a Crossword Scandal

Vincent Warmerdam shared this Youtube video which I thoroughly enjoyed watched. It’s about Saul Pwanson, a software engineer whose hobby project got a little out of hand.

In 2016, Saul Pwanson designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources.

After putting all these crosswords in a simple uniform format, Saul used some simple command line commands to check for common patterns and irregularities.

Surprisingly enough, after visualizing the results, Saul discovered egregious plagiarism by a major crossword editor that had gone on for years.

Ultimately, 538 even covered the scandal:

I thoroughly enjoyed watching this talk on Youtube.

Saul covers the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it’s like for a data project to get 15 minutes of fame.

I tried to localize the dataset online, but it seems Saul’s website has since gone offline. If you do happen to find it, please do share it in the comments!

Understanding Machine Learning (free e-book)

Understanding Machine Learning (free e-book)

Shai Shalev-Shwartz and Shai Ben-David of the Hebrew University of Jerusalem made their machine learning book free to download.

The book covers the basic foundations up to advanced theory and algorithms. I copied the table of contents below. It’s kind of math heavy, but well explained with visual examples and pseudo-code.

Moreover, the book contains multiple exercises for you to internalize the knowledge and skills.

As an added bonus, the professors teach a number of machine learning courses, the lecture slides and materials of which you can also access for free via the book’s website.

Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides a theoretical account of the fundamentals underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics, the book covers a wide array of central topics unaddressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for advanced undergraduates or beginning graduates, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics and engineering.

About the book

If you want to reward the professors for their efforts, please do buy a hardcopy version of book.

Table of contents

Part I: Foundations

  • A gentle start
  • A formal learning model
  • Learning via uniform convergence
  • The bias-complexity trade-off
  • The VC-dimension
  • Non-uniform learnability
  • The runtime of learning

Part II: From Theory to Algorithms

  • Linear predictors
  • Boosting
  • Model selection and validation
  • Convex learning problems
  • Regularization and stability
  • Stochastic gradient descent
  • Support vector machines
  • Kernel methods
  • Multiclass, ranking, and complex prediction problems
  • Decision trees
  • Nearest neighbor
  • Neural networks

Part III: Additional Learning Models

  • Online learning
  • Clustering
  • Dimensionality reduction
  • Generative models
  • Feature selection and generation

Part IV: Advanced Theory

  • Rademacher complexities
  • Covering numbers
  • Proof of the fundamental theorem of learning theory
  • Multiclass learnability
  • Compression bounds
  • PAC-Bayes

Appendices

  • Technical lemmas
  • Measure concentration
  • Linear algebra
Color curves: Choose a color palette with gradient

Color curves: Choose a color palette with gradient

Jan-Willem Tulp pointed out this amazing tool to choose a color palette: https://colorcurves.app

You can choose between either a continuous palette or a discrete palette, with groups that is.

Here’s an example of an exponential color curve for a continuous palette using colorcurves.app:

There are numerous functions you can use to make your “gradient color curve“.

Similarly, you can specify the lightness of the different colors along your curve.

Here’s another example, of an color arc for a categorical / discrete palette using colorcurves.app: