Category: learning

Machine Learning and AI courses at Google

Google has announced to provide open access to its artificial intelligence and machine learning courses. On their overview page, you will find many educational resources from machine learning experts at Google. They announced to share AI and machine learning lessons, tutorials and hands-on exercises for people at all experience levels. Simply filter through the resources and start learning, building and problem-solving.

For instance, up your game straight away with this 15-hour Machine Learning crash course. Zuri Kemp – who leads Google’s machine learning education program – said that over 18,000 Googlers have already enrolled in the course. Designed by the engineering education team, the courses explores loss functions and gradient descent and teached you to build your own neural network in Tensorflow.

Bayesian data analysis for newcomers

Professor John Kruschke and Torrin Liddell – one of his Ph.D. students at Indiana University – wrote a fantastically useful scientific paper introducing Bayesian data analysis to the masses. Kruschke and Liddell explain the main ideas behind Bayesian statistics, how Bayesians deal with continuous and binary variables, how to use and set meaningful priors, the differences between confidence and credibility intervals, how to perform model comparison tests, and many more. The paper is published open access so you can read it here.

I found it incredibly useful, providing me with a better understanding of how Bayesian analysis works, what kind of questions you can answer with it, and what the resulting insights would comprise of. After reading it, I was honestly asking myself why I don’t use Bayesian methods more often… So what’s next, how to learn more?

If you are equally convinced and want to really learn Bayesian statistics, you might want to have a look at Kruschke’s book Doing Bayesian Data Analysis: A tutorial with R, JAGS, and Stan.
If you’re up for the challenge, Kruschke and Liddel also published a more technical paper on Bayesian New Statistics at the same time as this introductory paper, also open-source.
You can also start doing some simple analysis:
- In R, theBayesFactor package and brms will get you started (suggested by u/data_for_everyone).
- In Python, pystan and pymc3 are helpful (suggested by u/joefromlondon).
If you prefer a more visual explanation of the fundamentals of Bayesian statistics, have a look at this YouTube video by Veritasium.

rstudio::conf 2018 summary

rstudio::conf is the yearly conference when it comes to R programming and RStudio. In 2017, nearly 500 people attended and, last week, 1100 people went to the 2018 edition. Regretfully, I was on holiday in Cardiff and missed out on meeting all my #rstats hero’s. Just browsing through the #rstudioconf Twitter-feed, I already learned so many new things that I decided to dedicate a page to it!

Fortunately, you can watch the live streams taped during the conference:

Two people have collected the slides of most rstudio::conf 2018 talks, which you can acces via the Github repo’s of matthewravey and by simecek. People on Twitter have particularly recommended teach the tidyverse to beginners (by David Robinson), the lesser known stars of the tidyverse (by Emily Robinson), the future of time series and financial analysis in the tidyverse (by Davis Vaughan of business-science.io), Understanding Principal Component Analysis (by Julia Silge), and Deploying TensorFlow models (by Javier Luraschi). Nevertheless, all other presentations are definitely worth checking out as well!

One of the workshops deserves an honorable mention. Jenny Bryan presented on What they forgot to teach you about R, providing some excellent advice on reproducible workflows. It elaborates on her earlier blog on project-oriented workflows, which you should read if you haven’t yet. Some best pRactices Jenny suggests:

Restart R often. This ensures your code is still working as intended. Use Shift-CMD-F10 to do so quickly in RStudio.
Use stable instead of absolute paths. This allows you to (1) better manage your imports/exports and folders, and (2) allows you to move/share your folders without the code breaking. For instance, here::here("data","raw-data.csv") loads the raw-data.csv-file from the data folder in your project directory. If you are not using the here package yet, you are honestly missing out! Alternatively you can use fs::path_home(). normalizePath() will make paths work on both windows and mac. You can usebasename instead of strsplit to get name of file from a path.
To upload an existing git directory to GitHub easily, you can usethis::use_github().
If you include the below YAML header in your .R file, you can easily generate .md files for you github repo.

#' ---
#' output: github_document
#' ---

Moreover, Jenny proposed these useful default settings for knitr:

knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
out.width = "100%"
)

Another of Jenny Bryan‘s talks was named Data Rectangling and although you might not get much out of her slides without her presenting them, you should definitely try the associated repurrrsive tutorial if you haven’t done so yet. It’s a poweR up for any useR!

Here’s a Shiny dashboard made by Garrick Aden-Buie including all the #rstudioconf tweets so you can browse the posts yourself. If you want to download the tweets, Mike Kearney (author of rtweet) shares the data here on his Github. Some highlights:

Amelia McNamera posted a cheat sheet comparing R’s dollar sign, formula, and tidyverse syntaxes.
Amanda Gadrow shared a RStudio debugging cheat sheet and a facebook of the rstudio::conf 2018 attendees.
Tim Mastny shared how to easily embed slides in blogdown websites.
David Robinson posted a first draft of Hadley Wickham‘s tidy tools manifesto.
Mike Kearney shared some cool analyses he conducted on the #rstudioconf Twitter data.
I can’t remember who shared it, but a very cool trick is to name the viewing tab of any dataframe you pipe into View() using df %>% View("enter_view_tab_name").

These probably only present a minimal portion of the thousands of tips and tricks you could have learned by simply attending rstudio::conf. I will definitely try to attend next year’s edition. Nevertheless, I hope the above has been useful. If I missed out on any tips, presentations, tweets, or other materials, please reply below, tweet me or pop me a message!

HR Analytics: Een 7e zintuig voor de moderne HR-professional

Wat gebeurt er in Nederland op het gebied van HR Analytics? Dit nieuwe boek laat zien wat enkele Nederlandse organisaties de afgelopen jaren daadwerkelĳk hebben ondernomen. De verschillende auteurs, waaronder ik mij mag scharen, geven een kĳkje in de praktĳkwereld van het onderbouwen van HR-beslissingen aan de hand van diverse databronnen en analysetechnieken. Ze verklaren daarmee HR Analytics niet heilig, maar wie als HR- professional waarde wil toevoegen aan de business, kan er veel aan hebben. Het credo is dan: weet wat je moet doen, wees alert op de valkuilen en beschouw HR Analytics als een zevende zintuig naast je andere zintuigen. Met dit extra zintuig kun je als HR- professional scherper waarnemen wat het echte HR-probleem is, en wat mogelĳk de oplossing is.

Het boek ‘HR Analytics’ is voor de moderne HR-professional die nieuwsgierig is naar wat analytics kan bĳ dragen aan zĳn of haar professionaliteit. De voorbeelden en verhalen uit de praktijk leveren verschillende leerpunten en inzichten die helpen bĳ een meer analytische benadering van de diverse HR beleidsthema’s rondom recruitment, loopbanen, arbeidsvoorwaarden, training en opleiding of engagement. Het is een duwtje in de rug op weg naar HR Analytics als een toevoeging aan het HR-vak. Niet als vervanging.

Wiemer Renkema, recensist op managementboeken.nl, heeft het boek inmiddels gelezen en vat de inhoud mooi samen:

In de tien hoofdstukken van het boek komen de belangrijkste HR analytics voorbij, zoals die voor recruitment, carrièreontwikkeling, medewerkerstevredenheid en beloning. De lezer kan zelf de relevantie van ieder onderwerp bepalen en gericht de informatie zoeken die voor hem van belang is. Bij ieder onderwerp gaan de schrijvers in op alle kernvragen, wat het boek een overzichtelijke en makkelijk leesbare structuur geeft.

[…]

Je hebt geen lange adem nodig om HR analytics. Een 7e zintuig voor de moderne HR-professional te lezen. Wat een praktisch, compleet en goed geschreven boek is dit!

Wiemer Renkema, recensist [link]

Hier kun je een deel van het introductiehoofdstuk inzien om te kijken of het boek iets voor jou is.

One year of paulvanderlaken.com

One year ago, I registered the domain paulvanderlaken.com with three reasons in mind: (1) I wanted an online environment to store and showcase my pet projects, (2) to share and promote some of the great blogs and research others had been writing, and (3) to show others what I was doing on my path to “data science“. The year has been just amazing. I could not have imagined the amount of positive sentiment I received from friends, family, acquaintances, and old classmates. But, most of all, the nice reactions from complete strangers across the globe! Thank you all so much for the positive response.

To my surprise, some of my stuff actually got read!

In August, I shared my Harry Plotter project and the first blog was read by 5,000 people. The second by another 3,000.
My lists of programming resources also got some traction: 5,672 people looked at my list of R materials, 976 followed my guide to leaRning R, and 670 examined data science courses and books. In contrast, only a couple of hundreds may have been disappointed by my lacking lists of Python and SQL resources and practice datasets.
Regarding data science, particularly my analysis of the 2017 Kaggle survey was viewed often, nearly 8,000 times in two months. Another 850 people saw my recent attempt at explaining data science visually.
I nerdified chRistmas: 1,562 people heard R play Jingle Bells and 911 watched the animated snow fall.
Machine learning is still booming and Google routed thousands of viewers to my blogs on deep learning with Keras and gradient boosting.
Text analytics / mining is one of my personal favorite topics, so I wrote about mining Twitter and mining Trump’s Twitter, clustering news articles, heavy metal lyrics, Facebook sentiment, Stranger Things scripts, and many more.
Several thousand viewers specifically came for Human Resource Management topics, like HR examples of Simpson’s paradox, job-switching behaviors, application success, application robots, and job mapping.

Some random stats:

In one year, I wrote 103 blogs which got over 42,000 views by nearly 30,000 visitors. 97.5% of these views occurred in the last six months. Most referrals came via Google (45%), reddit (18%), LinkedIn (8%), Facebook (8%), and Twitter (4%), and my blogs were shared a total of 241 times. Now, 51 people follow my blog, which is best viewed on Tuesdays (31%) and around 15:00h CET (6%).

My views between January 2017 and 2018, made with ggplot2 in R.

Although my personal learning is still the main reason I maintain this blog, I am very glad people seem to enjoy tagging along. Hopefully, I can continue to discover and write about data (analysis) during the coming 12 months. For now, I’d want to thank my readers for their continued interest and, in particular, my girlfriend for coping with the numerous evenings and weekend I have wasted on my pet projects. Nonetheless, it was definitely worth the effort!

Hope to see you again soon,

Paul

Cryptocurrency and Blockchain explained by 3Blue1Brown

Grant Sanderson is the owner of YouTube channel 3Blue1Brown, which aims to explain math and stats concepts in an entertaining way. Using animations, Grant grasps difficult problems and explains them in understandable language. I was already familiar with the great explanatory videos on Linear Algebra and Neural Networks, but this new video on cryptocurrencies and blockchain (below) is definitely one of the best explanations of Bitcoin I’ve seen so far: