Category: python

Free Python Tutorials & Courses, by CourseDuck

CourseDuck founder Michael Kuhlman was nice enough to point me to their overview of curated (&free!) Python courses and tutorials.

This overview is curated in the sense that all resources are rated by CourseDuck’s users. These ratings seem quite reliable, at least, I personally enjoyed their top-3 resources sometime the past years:

Note that all these courses, as well as the curated overview, come free of charge! A great resource for starting data scientists or upcoming pythonistas!

Via https://courseduck.com/programming/python/?k=1&m=c

Free Programming Books (I still need to read)

There are multiple unread e-mails in my inbox.

Links to books.

Just sitting there. Waiting to be opened, read. For months already.

The sender, you ask? Me. Paul van der Laken.

A nuisance that guy, I tell you. He keeps sending me reminders, of stuff to do, books to read. Books he’s sure a more productive me would enjoy.

Now, I could wipe my inbox. Be done with it. But I don’t wan’t to lose this digital to-do list… Perhaps I should put them here instead. So you can help me read them!

Each of the below links represents a formidable book on programming! (I hear)
And there are free versions! Have a quick peek. A peek won’t hurt you:

Disclaimer: This page contains one or more links to Amazon.
Any purchases made through those links provide us with a small commission that helps to host this blog.

Applied Predictive Modelling – by Max Kuhn & Kjell Johnson
Feature Engineering and Selection: A Practical Approach for Predictive Models – by Max Kuhn & Kjell Johnson
- http://www.feat.engineering/
The Pragmatic Programmer – by Andrew Hunt & David Thomas
- Buy this book via Amazon to support its authors
- https://www.nceclusters.no/globalassets/filer/nce/diverse/the-pragmatic-programmer.pdf
Clean Code – by Robert Martin
- Buy this book via Amazon to support its authors
- https://www.investigatii.md/uploads/resurse/Clean_Code.pdf
R for Data Science – by Hadley Wickham
- Buy this book via Amazon to support its authors
- https://r4ds.had.co.nz/

Advanced R – by Hadley Wickham
- Buy this book via Amazon to support its authors
- https://adv-r.hadley.nz/index.html

R Markdown: The Definitive Guide – by Yihui Xie, J. J. Allaire, & Garrett Grolemund
- Buy this book via Amazon to support its authors
- https://bookdown.org/yihui/rmarkdown/
Bookdown: Authoring Books and Technical Documents with R Markdown – by Yihui Xie
- Buy this book via Amazon to support its authors
- https://bookdown.org/yihui/bookdown/
blogdown – by Yihui Xie
- Buy this book via Amazon to support its authors
- https://bookdown.org/yihui/blogdown/
The Hundred Page Machine Learning Book – by Andriy Burkov
- Buy this book via Amazon to support its authors
- https://file.ai100.com.cn/files/file-code/original/cd136ebe-0e34-4e43-966b-224acff83005/100MLBOOK/Chapter8.pdf
An Introduction to Statistical Learning – by Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani
- Buy this book via Amazon to support its authors
- http://www-bcf.usc.edu/~gareth/ISL/

The Elements of Statistical Learning – by Trevor Hastie, Robert Tibshirani, & Jerone Friedman
- Buy this book via Amazon to support its authors
- http://web.stanford.edu/~hastie/ElemStatLearn/

Interpretable Machine Learning – by Christoph Molnar
- Buy this book via LeanPub to support its authors
- https://christophm.github.io/interpretable-ml-book/index.html
Deep Learning – by Ian Goodfellow, Yoshua Bengio, & Aaron Courville
- Buy this book via Amazon to support its authors
- https://www.deeplearningbook.org/
Deep Learning with Python – by Francois Chollet
Pro Git – by Scott Chacon & Ben Straub
- Buy this book via Amazon to support its authors
- https://git-scm.com/book/en/v2

The books listed above have a publicly accessible version linked. Some are legitimate. Other links are somewhat shady.
If you feel like you learned something from reading one of the books (which you surely will), please buy a hardcopy version. Or an e-book. At the very least, reach out to the author and share what you appreciated in his/her work.
It takes valuable time to write a book, and we should encourage and cherish those who take that time.

For more books on R programming, check out my R resources overview.

For books on data analytics and (behavioural) psychology in (HR) management, check out Books for the modern data-driven HR professional.

GoalKicker: Free Programming Books

This specific link has been on my to-do list for so-long now that I’ve decided to just share it without any further ado.

The people behind GoalKicker, for whatever reason, decided to compile nearly 100 books on different programming languages based on among others StackOverflow entries. Their open access library contains books on languages from Latex to Linux, from Java to JavaScript, from SQL to MySQL, and from C, to C++, C#, and objective-C.

Basically, they host it all. Have a look yourself: https://books.goalkicker.com/

#100DaysOfCode: Machine Learning & Data Visualization

2018 seemed to be the year of challenges going viral on the web. Most of them were plain stupid and/or dangerous. However, one viral challenge I did like: #100DaysOfCode

1. Code minimum an hour every day for the next 100 days.
2. Tweet your progress every day with the #100DaysOfCode hashtag.
3. Each day, reach out to at least two people on Twitter who are also doing the challenge
100 Days of Code rulebook

Many (aspiring) programming professionals competed in this challenge, sharing their learning journeys in domains from web development, machine learning, or data visualization.

With this blog, I wanted to share two of those learning journeys that stood out for me.

Machine learning

First, there’s Avik Jain’s 100 days of Machine Learning code repository on Github. Avik’s repository contains all learning activities he followed during the 53 days of programming he completed. Some of Avik’s entries really stood out, and I particularly liked his educational infographics:

Just look at the wonderful design and visual aids on this decision tree for dummies infographic, pseudocode and all:

Day 23: Decision trees for dummies. This just looks fabulous right?!

Apart from the infographics, Avik also links to many very well produced tutorials that helped him improve his machine learning skills. Such as the free Python for Data Science Handbook Avik worked through, or this Youtube tutorial on deep learning in Python with Tensorflow and Keras:

Although Avik didn’t seem to have completed the full 100 days, many others did.

Data visualization

I have blogged about Hannah Yan Han‘s 100 days of code project before, but she definately deserves another mention here. Her 100 days revolved around data science, data visualization, and storytelling using both R and Python. You can find her #100DaysOfCode Medium page here, and her associated Github repository here.

For example, one day Hannah explored where instant noodles come from, how they are served, and whether people like them or not.

A different day she would examine which sports are the thoughest:

Or how scientific researchers migrate across the globe:

Hannah used many different plot types in those 100 days. Also some lesser known ones, like these upset plots on TED talk data:

Heck, she even made her own R package to generate Mondriaan-like paintings on one of the days:

What I found so great about Hannah’s project is that she picked a novel dataset every couple of days. Moreover, she used a extremely large variety of different visualization formats. All visuals were equally beautiful, but Hannah made sure to pick the right one for the purpose she was trying to serve. If you are interested in data visualization, you seriously should check out Hannah’s 100DaysOfCode Medium page.

Beating Battleships with Algorithms and AI

Past days, I discovered this series of blogs on how to win the classic game of Battleships (gameplay explanation) using different algorithmic approaches. I thought they might amuse you as well : )

The story starts with this 2012 Datagenetics blog where Nick Berry constrasts four algorithms’ performance in the game of Battleships. The resulting levels of artificial intelligence (AI) seem to compare respectively to a distracted baby, two sensible adults, and a mathematical progidy.

The first, stupidest approach is to just take Random shots. The AI resulting from such an algorithm would just pick a random tile to shoot at each turn. Nick simulated 100 million games with this random apporach and computed that the algorithm would require 96 turns to win 50% of games, given that it would not be defeated before that time. At best, the expertise level of this AI would be comparable to that of a distracted baby. Basically, it would lose from the average toddler, given that the toddler would survive the boredom of playing such a stupid AI.

A first major improvement results in what is dubbed the Hunt algorithm. This improved algorithm includes an instruction to explore nearby spaces whenever a prior shot hit. Every human who has every played Battleships will do this intuitively. A great improvement indeed as Nick’s simulations demonstrated that this Hunt algorithm completes 50% of games within ~65 turns, as long as it is not defeated beforehand. Your little toddler nephew will certainly lose, and you might experience some difficulty as well from time to time.

A visual representation of the “Hunting” of the algorithm on a hit [via]

Another minor improvement comes from adding the so-called Parity principle to this Hunt algorithm (i.e., Nick’s Hunt + Parity algorithm). This principle instructs the algorithm to take into account that ships will always cover odd as well as even numbered tiles on the board. This information can be taken into account to provide for some more sensible shooting options. For instance, in the below visual, you should avoid shooting the upper left white tile when you have already shot its blue neighbors. You might have intuitively applied this tactic yourself in the past, shooting tiles in a “checkboard” formation. With the parity principle incorporated, the median completion rate of our algorithm improves to ~62 turns, Nick’s simulations showed.

The Parity “checkerboard” principle [via]

Now, Nick’s final proposed algorithm is much more computationally intensive. It makes use of Probability Density Functions. At the start of every turn, it works out all possible locations that every remaining ship could fit in. As you can imagine, many different combinations are possible with five ships. These different combinations are all added up, and every tile on the board is thus assigned a probability that it includes a ship part, based on the tiles that are already uncovered.

Computing the probability that a tile contains a ship based on all possible board layouts [via]

At the start of the game, no tiles are uncovered, so all spaces will have about the same likelihood to contain a ship. However, as more and more shots are fired, some locations become less likely, some become impossible, and some become near certain to contain a ship. For instance, the below visual reflects seven misses by the X’s and the darker tiles which thus have a relatively high probability of containing a ship part.

An example distribution with seven misses on the grid. [via]

Nick simulated 100 million games of Battleship for this probabilistic apporach as well as the prior algorithms. The below graph summarizes the results, and highlight that this new probabilistic algorithm greatly outperforms the simpler approaches. It completes 50% of games within ~42 turns! This algorithm will have you crying at the boardgame table.

Relative performance of the algorithms in the Datagenetics blog, where “New Algorithm” refers to the probabilistic approach and “No Parity” refers to the original “Hunt” approach.

Reddit user /u/DataSnaek reworked this probablistic algorithm in Python and turned its inner calculations into a neat GIF. Below, on the left, you see the probability of each square containing a ship part. The brighter the color (white <- yellow <- red <- black), the more likely a ship resides at that location. It takes into account that ships occupy multiple consecutive spots. On the right, every turn the algorithm shoots the space with the highest probability. Blue is unknown, misses are in red, sunk ships in brownish, hit “unsunk” ships in light blue (sorry, I am terribly color blind).

The probability matrix as a heatmap for every square after each move in the game. [via]

This latter attempt by DataSnaek was inspired by Jonathan Landy‘s attempt to train a reinforcement learning (RL) algorithm to win at Battleships. Although the associated GitHub repository doesn’t go into much detail, the approach is elaborately explained in this blog. However, it seems that this specific code concerns the training of a neural network to perform well on a very small Battleships board, seemingly containing only a single ship of size 3 on a board with only a single row of 10 tiles.

Fortunately, Sue He wrote about her reinforcement learning approach to Battleships in 2017. Building on the open source phoenix-battleship project, she created a Battleship app on Heroku, and asked co-workers to play. This produced data on 83 real, two-person games, showing, for instance, that Sue’s coworkers often tried to hide their size 2 ships in the corners of the Battleships board.

Probability heatmaps of ship placement in Sue He’s reinforcement learning Battleships project [via]

Next, Sue scripted a reinforcement learning agent in PyTorch to train and learn where to shoot effectively on the 10 by 10 board. It became effective quite quickly, requiring only 52 turns (on average over the past 25 games) to win, after training for only a couple hundreds games.

The performance of the RL agent at Battleships during the training process [via]

However, as Sue herself notes in her blog, disappointly, this RL agent still does not outperform the probabilistic approach presented earlier in this current blog.

Reddit user /u/christawful faced similar issues. Christ (I presume he is called) trained a convolutional neural network (CNN) with the below architecture on a dataset of Battleships boards. Based on the current board state (10 tiles * 10 tiles * 3 options [miss/hit/unknown]) as input data, the intermediate convolutional layers result in a final output layer containing 100 values (10 * 10) depicting the probabilities for each tile to result in a hit. Again, the algorithm can simply shoot the tile with the highest probability.

NN diagram — Christ’s convolutional neural network architecture for Battleships [via]

Christ was nice enough to include GIFs of the process as well [via]. The first GIF shows the current state of the board as it is input in the CNN — purple represents unknown tiles, black a hit, and white a miss (i.e., sea). The next GIF represent the calculated probabilities for each tile to contain a ship part — the darker the color the more likely it contains a ship. Finally, the third picture reflects the actual board, with ship pieces in black and sea (i.e., miss) as white.

As cool as this novel approach was, Chris ran into the same issue as Sue, his approach did not perform better than the purely probablistic one. The below graph demonstrates that while Christ’s CNN (“My Algorithm”) performed quite well — finishing a simulated 9000 games in a median of 52 turns — it did not outperform the original probabilistic approach of Nick Berry — which came in at 42 turns. Nevertheless, Chris claims to have programmed this CNN in a couple of hours, so very well done still.

cdf — The performance of Christ’s Battleship CNN compared to Nick Berry’s original algorithms [via]

Interested by all the above, I searched the web quite a while for any potential improvement or other algorithmic approaches. Unfortunately, in vain, as I did not find a better attempt than that early 2012 Datagenics probability algorithm by Nick.

Surely, with today’s mass cloud computing power, someone must be able to train a deep reinforcement learner to become the Battleship master? It’s not all probability right, there must be some patterns in generic playing styles, like Sue found among her colleagues. Or maybe even the ability of an algorithm to adapt to the opponent’s playin style, as we see in Libratus, the poker AI. Maybe the guys at AlphaGo could give it a shot?

For starters, Christ’s provided some interesting improvements on his CNN approach. Moreover, while the probabilistic approach seems the best performing, it might not the most computationally efficient. All in all, I am curious to see whether this story will continue.

Join 385 other subscribers

Debuggex: A regular expression testing tool

I came across this awesome regular expression tool I wanted to share. Debuggex allows you to interactively write, test and visually inspect what your regular expressions match in either Python, JavaScript, or Perl.