Tag: book

AI Book Review: You look like a thing and I love you

AI Book Review: You look like a thing and I love you

The following are my summary and take-aways from Janelle Shane’s 2019 book named You look like a thing and I love you. Most of the below are excerpts from Janelle’s book, combined, or rewritten by me. For the sake of copyright, just consider everything Janelle’s : )

Image result for things called ai janelle shane

AI weirdness

You look like a thing and I love you is about AI. More specifically, the book is about what AI can and can not do. And how and why AI often fails in miserably hilareous ways.

Janelle has spend her time foing fun experiments with AI. In this book, she shares those experiments along with many real life examples of AIs in practice. While explaining the technical details behind these AIs in an accesible though technically correct way, she informs the reader where, how, and why AIs fail.

Janelle took AIs out of their comfort zone and it produced some hilareously weird results. She proposes five principles of AI Weirdness:

  1. The danger of AI is not that it’s too smart, but that it’s not smart enough
  2. AI has the approximate brainpower of a worm
  3. AI does not really understand the problem you want it to solve
  4. But: AI will do exactly what you tell it to. Or at least it will try its best.
  5. And AI willt ake the path of the least resistance

Definitions: What is (not) AI?

If it seems like AI is everywhere, it’s partly because Artificial Intelligence means lots of things, depending on whether you’re reading science fiction or selling a new app or doing academic research.

To spot an AI in the wild, it’s important to know the difference between machine learning algorithms (what Janelle calls AI in her book) and traditional, rules-based programs.

To solve a problem with a rules-based program, you have to know every step required to complete the program’s task and how to describe each one of those steps. But a machine learning algorithm figures out the rules for itself via trail and error, gauging its success on goals the programmer has specified. As the AI tries to reach this goal, it can discover rules and correlations that the programmer didn’t even know existed. This is what makes AIs attractive problem solvers and is particularly handy if the rules are really complicated or just plain mysterious.

Sometimes an AI’s brilliant problem-solving rules actually rely on mistaken assumptions. Rules that served it well in training but fail miserably when it encountered the real world. While training errors are common in complex AIs, the consequences of these mistakes can be serious.

It’s often not easy to tell when AIs make mistakes. Since we don’t write the rules, they come up with their own, and they don’t write them down or explain them the way a human would.

The difference between succesful AI problem solving and failure usually has a lot to do with the suitability of the task for an AI solution. And there are plenty of tasks for which AI solutions are more efficient than human solutions. But there are also plenty of cases where things go miserably wrong.

Janelle proposes four signs of “AI Doom”, contexts where machine learning will not produce the desired results:

  1. The problem is too hard, broad, or complex
  2. The problem is not what we thought it was
  3. There are sneaky shortcuts to solving the problem
  4. The AI tried to solve the problem learning from flawed data

Programming an AI is almost more like teaching a child than programming a computer.

Explaining how AI works

In her book, Janelle takes us through many example problems which she or others tried to solve using AIs. These example problems are increasingly hilareous, but I assure you that they are technically and didactically sound:

  • Playing tic-tac-toe
  • Managing a cockroach farm
  • Riding a bicycle
  • Rating sandwich deliciousness
  • Tossing a sandwich into a wall
  • Guiding people through a hallway
  • Answering questions regarding photo’s
  • Categorizing doodles
  • Categorizing fish
  • Tossing pancakes
  • Autonomous walking
  • Autonomous driving
  • Playing Pacman

The amazing thing is these ridiculous example problems actually serve a purpose. They are used to explain different algorithms and their applications, strengths, and limitations! Janelle covers a wide variety of algorithms in such a way that anyone new to machine learning would understand, while people with some experience will still be amused.

Janelle talks about artificial neural networks, random forests, and markov chains. Moreover, she explains how activation functions, recurrancy and long short-term memory, evolutionary algorithms and gradient descent work. And all in understandable though technically correct language.

Janelle herself seems particularly fond of generative algorithms. She’s elaborates on having deployed recurrent neural nets, generative adversial networks, and markov chains for a wide variety of generative tasks. In the book, Jabekke explains what went well and went wrong when coming up with new and original…

  • pick-up lines
  • knock-knock jokes
  • names for species of birds
  • perfumes names
  • ice-cream flavors
  • cooking recipes
  • dream descriptions
  • horse drawings
  • Harry Potter scripts
  • cat names
  • Halloween costumes
  • elementary school blueprints
  • names for Benedict Cumberbatch
  • Dungeons and Dragons spells
  • pie recipes

Where does AI fail?

Janelle’s book is lingered with examples of failing AI. As a matter of fact, the whole book seems like an ode to how machine learning can and will inevitably fail. Particularly in the latter chapters, Janelle covers many limitations of and issues with AI in much detail:

  • class imbalance
  • overfitting
  • unrealistic simulation conditions
  • data quality issues
  • self-fullfilling prophecies
  • undesirable reward function optimization
  • missing the obvious
  • catastrophic forgetting
  • human biases in the data
  • machine bias
  • math-washing / bias laundering
  • bias amplification
  • adversarial attacks

Definite recommendation

I have yet to come across a book that explain AI in this much detail and in a manner as accessible and entertaining as Janelle Shane does in You look like a thing and I love you. Janelle makes machine learning and AI understandable for a wide public without passing on the deeper technical details. Taking a critical stance, she provides a good overview of the strenghts and weaknesses of AI, and a realistic outlook for the future to come. This book is not looking for sensation or hype, although reading it will be a most amusing experience for the more technical as well as the lay reader.

I highly recommend you reward yourself with a copy!

E-Book: Probabilistic Programming & Bayesian Methods for Hackers

E-Book: Probabilistic Programming & Bayesian Methods for Hackers

The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. Nevertheless, mathematical analysis is only one way to “think Bayes”. With cheap computing power, we can now afford to take an alternate route via probabilistic programming.

Cam Davidson-Pilon wrote the book Bayesian Methods for Hackers as a introduction to Bayesian inference from a computational and understanding-first, mathematics-second, point of view.

The book is available via Amazon, but you can access an online e-book for free. There’s also an associated GitHub repo.

The book explains Bayesian principles with code and visuals. For instance:

%matplotlib inline
from IPython.core.pylabtools import figsize
import numpy as np
from matplotlib import pyplot as plt
figsize(11, 9)

import scipy.stats as stats

dist = stats.beta
n_trials = [0, 1, 2, 3, 4, 5, 8, 15, 50, 500]
data = stats.bernoulli.rvs(0.5, size=n_trials[-1])
x = np.linspace(0, 1, 100)

for k, N in enumerate(n_trials):
    sx = plt.subplot(len(n_trials)/2, 2, k+1)
    plt.xlabel("$p$, probability of heads") \
        if k in [0, len(n_trials)-1] else None
    plt.setp(sx.get_yticklabels(), visible=False)
    heads = data[:N].sum()
    y = dist.pdf(x, 1 + heads, 1 + N - heads)
    plt.plot(x, y, label="observe %d tosses,\n %d heads" % (N, heads))
    plt.fill_between(x, 0, y, color="#348ABD", alpha=0.4)
    plt.vlines(0.5, 0, 4, color="k", linestyles="--", lw=1)

    leg = plt.legend()
    leg.get_frame().set_alpha(0.4)
    plt.autoscale(tight=True)


plt.suptitle("Bayesian updating of posterior probabilities",
             y=1.02,
             fontsize=14)

plt.tight_layout()

I can only recommend you start with the online version of Bayesian Methods for Hackers, but note that the print version helps sponsor the author ánd includes some additional features:

  • Additional Chapter on Bayesian A/B testing
  • Updated examples
  • Answers to the end of chapter questions
  • Additional explanation, and rewritten sections to aid the reader.

If you’re interested in learning more about Bayesian analysis, I recommend these other books:

Helpful resources for A/B testing

Helpful resources for A/B testing

Brandon Rohrer — (former) data scientist at Microsoft, iRobot, and Facebook — asked his network on Twitter and LinkedIn to share their favorite resources on A/B testing. It produced a nice list, which I summarized below.

The order is somewhat arbitrary, and somewhat based on my personal appreciation of the resources.

Cover image via Optimizely

Recreating graphics from the  Fundamentals of Data Visualization

Recreating graphics from the Fundamentals of Data Visualization

Claus Wilke wrote the Fundamentals of Data Visualization – a great resource that’s definitely high on my list of recommended data visualization books.

In a recent post, Claus shared the link to a GitHub repository where he hosts some of the R programming code with which Claus made the graphics for his dataviz book. The repository is named practical ggplot2, after the R package Clause used to make many of his visuals.

Check it out, the page contains some pearls and the code behind them, which will help you learn to create fabulous visualizations yourself. Some examples:

Via https://htmlpreview.github.io/?https://github.com/clauswilke/practical_ggplot2/blob/master/health_status.html
Via https://htmlpreview.github.io/?https://github.com/clauswilke/practical_ggplot2/blob/master/corruption_human_development.html

Here’s the original tweet in case you want to see the responses.

Recommended Books on Data Visualization

Recommended Books on Data Visualization

Disclaimer: This page contains links to Amazon’s book shop.
Any purchases through those links provide us with a small commission that helps to host this blog.

Data visualization and the (in)effective communication of information are salient topics on this blog. I just love to read and write about best practices related to data visualization (or bad practices), or to explore novel types of complex graphs. However, I am not always online, and I am equally fond of reading about data visualization offline.

These amazing books about data visualization
are written by some of the leading experts in the dataviz scene:

Happy reading!


If you are also interested in programming and machine learning, have a look at this list of free programming books.

Generating Book Covers By Their Words — My Dissertation Cover

Generating Book Covers By Their Words — My Dissertation Cover

As some of you might know, I am defending my PhD dissertation later this year. It’s titled “Data-Driven Human Resource Management: The rise of people analytics and its application to expatriate management” and, over the past few months, I was tasked with designing its cover.

Now, I didn’t want to buy some random stock photo depicting data, an organization, or overly happy employees. I’d rather build something myself. Something reflecting what I liked about the dissertation project: statistical programming and sharing and creating knowledge with others.

Hence, I came up with the idea to use the collective intelligence of the People Analytics community to generate a unique cover. It required a dataset of people analytics-related concepts, which I asked People Analytics professionals on LinkedIn, Twitter, and other channels to help compile. Via a Google Form, colleagues, connections, acquitances, and complete strangers contributed hundreds of keywords ranging from the standard (employees, HRM, performance) to the surprising (monetization, quantitative scissors [which I had to Google]). After reviewing the list and adding some concepts of my own creation, I ended up with 1786 unique words related to either business, HRM, expatriation, data science, or statistics.

I very much dislike wordclouds (these are kind of cool though), but already had a different idea in mind. I thought of generating a background cover of the words relating to my dissertation topic, over which I could then place my title and other information. I wanted to place these keywords randomly, maybe using a color schema, or with some random sizes.

The picture below shows the result of one of my first attempts. I programmed everything in R, writing some custom functionality to generate the word-datasets, the cover-plot, and .png, .pdf, and .gif files as output.

canvas.PNG

Random colors did not produce a pleasing result and I definitely needed more and larger words in order to fill my 17cm by 24cm canvas!

Hence, I started experimenting. Using base R’s expand.grid() and set.seed() together with mapply(), I could quickly explore and generate a large amount of covers based on different parameter settings and random fluctuations.

expand.grid(seed = c(1:3), 
            dupl = c(1:4, seq(5, 30, 5)),
            font = c("sans", "League Spartan"),
            colors = c(blue_scheme, red_scheme, 
                       rainbow_scheme, random_scheme),
            size_mult = seq(1, 3, 0.3),
            angle_sd = c(5, 10, 12, 15)) -> 
  param

mapply(create_textcover, 
       param$seed, param$dupl, 
       param$font, param$colors, 
       param$size_mult, param$angle_sd)

The generation process for each unique cover only took a few seconds, so I would generate a few hundred, quickly browse through them, update the parameters to match my preferences, and then generate a new set. Among others, I varied the color palette used, the size range of the words, their angle, the font used, et cetera. To fill up the canvas, I experimented with repeating the words: two, three, five, heck, even twenty, thirty times. After an evening of generating and rating, I came to the final settings for my cover:

  • Words were repeated twenty times in the dataset.
  • Words were randomly distributed across the canvas.
  • Words placed in random order onto the canvas, except for a select set of relevant words, placed last.
  • Words’ transparency ranged randomly between 0% and 70%.
  • Words’ color was randomly selected out of six colors from this palette of blues.
  • Words’ writing angles were normally distributed around 0 degrees, with a standard deviation of 12 degrees. However, 25% of words were explicitly without angle.
  • Words’ size ranged between 1 and 4 based on a negative binomial distribution (10 * 0.8) resulting in more small than large words. The set of relevant words were explicitly enlarged throughout.

With League Spartan (#thisisparta) loaded as a beautiful custom font, this was the final cover background which I and my significant other liked most:

cover_wordcloud_20-League Spartan-4.png

While I still need to decide on the final details regarding title placement and other details, I suspect that the final cover will look something like below — the white stripe in the middle depicting the book’s back.

coverpaul.png

Now, for the finale, I wanted to visualize the generation process via a GIF. Thomas Lin Pedersen developed this great gganimate package, which builds on the older animation package. The package greatly simplifies creating your own GIFs, as I already discussed in this earlier blog about animated GIFs in R. Anywhere, here is the generation process, where each frame includes the first frame ^ 3.2 words:

cover_wordcloud_20-League Spartan_4.gif

If you are interested in the process, or the R code I’ve written, feel free to reach out!

I’m sharing a digital version of the dissertation online sometime around the defense date: November 9th, 2018. If you’d like a copy, you can still leave your e-mailadress in the Google Form here and I’ll make sure you’ll receive your copy in time!

(Time Series) Forecasting: Principles & Practice (in R)

(Time Series) Forecasting: Principles & Practice (in R)

I stumbled across this open access book by Rob Hyndman, the god of time series, and George Athanasopoulos, a colleague statistician / econometrician at Monash University in Melbourne Australia.

Hyndman and Athanasopoulos provide a comprehensive introduction to forecasting methods, accessible and relevant among others for business professionals without any formal training in the area. All R examples in the book assume work build on the fpp2 R package. fpp2 includes all datasets referred to in the book and depends on other R packages including forecast and ggplot2.

Some examples of the analyses you can expect to recreate, ignore the agricultural topic for now ; )

Monthly milk production per cow.
One of the example analysis you will recreate by following the book (Figure 3.3)

Forecasts of egg prices using a random walk with drift applied to the logged data.
You will be forecasting price data using different analyses and adjustments (Figure 3.4)

I highly recommend this book to any professionals or students looking to learn more about forecasting and time series modelling. There is also a DataCamp course based on this book. If you got value out of this free book, be sure to buy a hardcopy as well.