Getting started with Python in Visual Studio Code

After several years of proscrastinating, the inevitable finally happened: Three months ago, I committed to learning Python! I must say that getting started was not easy. One afternoon three months ago, I sat down, motivated to get started. Obviously, the first step was to download and install Python as well as something to write actual…

What Every Programmer Needs To Know About Encodings

Kunststube wrote this great introduction to text encoding. Ever wondered why your Word document sometimes starts with ÉGÉìÉRÅ[ÉfÉBÉìÉOÇÕìÔǵÇ≠ǻǢ? Well, encoding‘s why. Kunststube introduces you to the wonderful world of ASCII, WLatin, Mac Latin, and UTF-8, -16 and -32. Read the original articla via http://kunststube.net/encoding/

Debuggex: A regular expression testing tool

I came across this awesome regular expression tool I wanted to share. Debuggex allows you to interactively write, test and visually inspect what your regular expressions match in either Python, JavaScript, or Perl. Read more about regular expressions here, for instance their implementation in R.

Generating Book Covers By Their Words — My Dissertation Cover

As some of you might know, I am defending my PhD dissertation later this year. It’s titled “Data-Driven Human Resource Management: The rise of people analytics and its application to expatriate management” and, over the past few months, I was tasked with designing its cover. Now, I didn’t want to buy some random stock photo…

Sentiment Analysis: Analyzing Lexicon Quality and Estimation Errors

Sentiment analysis is a topic I cover regularly, for instance, with regard to Harry Plotter, Stranger Things, or Facebook. Usually I stick to the three sentiment dictionaries (i.e., lexicons) included in the tidytext R package (Bing, NRC, and AFINN) but there are many more one could use. Heck, I’ve even tried building one myself using a synonym/antonym…

Regular Expression Crosswords

A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. The regex equivalent is .*\.txt$. Last week I posted…

Datasets to practice and learn Programming, Machine Learning, and Data Science

Many requests have come in regarding “training datasets” – to practice programming. Fortunately, the internet is full of open-source datasets! I compiled a selected list of datasets and repositories below. If you have any additions, please comment or contact me! For information on programming languages or algorithms, visit the overviews for R, Python, SQL, or Data Science,…

Regular Expressions in R – Part 1: Introduction and base R functions

The following is the first part of my introduction to regular expression (regex), in general, and the use of regex in R, in specific. It is loosely inspired on the swirl() tutorial by Jon Calder. I created it in R Markdown and uploaded it to RPubs, for an easier read. Regular expression A regular expression, regex or regexp…

Text Mining: Pythonic Heavy Metal

This blog summarized work that has been posted here, here, and here. Iain of degeneratestate.org wrote a three-piece series where he applied text mining to the lyrics of 222,623 songs from 7,364 heavy metal bands spread over 22,314 albums that he scraped from darklyrics.com. He applied a broad range of different analyses in Python, the code of which…