Tag: bestpractices

How to confuse your shareholders by bad data visualization

Like many people during the COVID19 crisis, I turned to the stock market as a new hobby.

Like the ignorant investor that I am, I thought it wise to hop on the cloud computing bandwagon.

Hence, I bought, among others, a small position in Rackspace Technologies.

A long way down

Now, my Rackspace shares have plummeted in price since I bought them.

Screenshot of Google Finance on August 25th 2021: https://www.google.com/finance/quote/RXT:NASDAQ?sa=X&ved=2ahUKEwjxqdr0oczyAhWKtqQKHZk3A90Q_AUoAXoECAEQAw&window=6M

Obviously, this is less than ideal for me, but also, I should not be surprised.

Clearly, I knew nothing about the company I bought shares in. Apparently they are going through some big time reorganization, and this is not good price-wise.

Fast forward to yesterday.

Doing research

To re-evalute my investment, I thought it wise to have a look at Rackspace’s Quarterly Report.

According to Investopedia: A quarterly report is a summary or collection of unaudited financial statements, such as balance sheets, income statements, and cash flow statements, issued by companies every quarter (three months). In addition to reporting quarterly figures, these statements may also provide year-to-date and comparative (e.g., last year’s quarter to this year’s quarter) results. Publicly-traded companies must file their reports with the Securities Exchange Committee (SEC).

Fortunately these quarterly reports are readily available on the investors relation page, and they are not that hard to read once you have seen a few.

Visualizing financial data

I was excited to see that Rackspace offered their financial performance in bite-sized bits to me as a laymen, through their usage of nice visualizations of the financial data.

Please take a moment to process the below copy of page 11 of their 2021 Q2 report:

Screenshot of page 11 of the 2021 Q2 Quarterly Report of Rackspace Technologies: https://ir.rackspace.com/static-files/474fde80-f203-4227-a438-57b062992d46

Though… the longer I looked at these charts… the more my head started to hurt…

How can the growth line be about the same in the three charts Total Revenue (top-left), Core Revenue (top-right), and Non-GAAP EPS (bottom-right)? They represent different increments: 13%, 17%, and 14% respectively.

Zooming in on the top left: how does the $657 revenue of 2Q20 fit inside the $744 revenue of 2Q21 almost three times?!

The increase is only 13%, not 300%!

Recreating the problem

I decided to recreate the vizualizations of the quarterly report.

To see what the visualization should have actually looked like. And to see how they could have made this visualization worse.

You can find the R ggplot2 code for these plots here on Github.

If you know me, you know I can’t do something 50%, so I decided to make the plots look as closely to the original Rackspace design as possible.

Here are the results:

Here are all three combined, along with two simple questions:

This I shared on social media (LinkedIn, Twitter), to ask for people’s opinions:

And I tagged Rackspace and offered them my help!

I hope they’re not offended and respond : )

10 Tips for Effective Dashboard Design by Deloitte

My colleague prof. Jack van Wijk pointed me towards these great guidelines by Deloitte on how to design an effective dashboard.

Click to access deloitte-nl-data-analytics-10-commandments-effective-dashboarding.pdf

Some of these rules are more generally applicable to data visualization. Yet, the Deloitte 10 commandments form a good checklist when designing a dashboard.

Here’s my interpretation of the 10 rules:

Know your message or goal
Choose the chart that conveys your message best
Use a grid to bring order to your dashboard
Use color only to highlight and draw attention
Remove unneccessary elements
Avoid information overload
Design for ease of use
Text is as important as charts
Design for multiple devices (desktop, tablet, mobile, …)
Recycle good designs (by others)

In terms of recycling the good work by others operating in the data visualization field, check out:

I just love how Deloitte uses example visualizations to help convey what makes a good (dashboard) chart:

10 Guidelines to Better Table Design

Jon Schwabisch recently proposed ten guidelines for better table design.

Next to the academic paper, Jon shared his recommendations in a Twitter thread.

I recently published "Ten Guidelines for Better Tables" in the Journal of Benefit Cost Analysis (@benefitcost) on ways to improve your data tables.

Here's a thread summarizing the 10 guidelines.

Full paper is here: https://t.co/VSGYnfg7iP pic.twitter.com/W6qbsktioL
— Jon Schwabish (@jschwabish) August 3, 2020

Let me summarize them for you:

Right-align your numbers
Left-align your texts
Use decimals appropriately (one or two is often enough)
Display units (e.g., $, %) sparsely (e.g., only on first row)
Highlight outliers
Highlight column headers
Use subtle highlights and dividers
Use white space between rows and columns
Use white space (or dividers) to highlight groups
Use visualizations for large tables

Afbeelding — Highlights in a table. Via twitter.com/jschwabish/status/1290324966190338049/photo/2

The 10 Fundamental Concepts of JavaScript

Another pearl of a resource on Twitter is this thread by Madison on 10 of fundamentalal concepts of Javascript — and programming in general for that matter.

For your convience, I copied the links below. Just click them to browse to the resource and learn more about the concept

If you're learning JavaScript, you've likely heard people tell you how important it is to learn the fundamentals.

But what are they? And where do you learn them?

Here's a list of JavaScript fundamentals and my favorite free resources for learning them. 👇
— Madison (@Madisonkanna) June 20, 2020

Click to learn more about each concept

Variables & Scoping
Data types
Objects, Funtions & Arrays
Document Object Model (DOM)
Prototypes & this.
Events
Flow Control (specifically, for-loops)
Security & (web) Accesibility
Good coding practices (to which I’ve linked before)
Async

This 10-step list was compiled as apart of this interesting podcast, which I recommend you listen to as well.

Want to learn more?

According to many, this is the best book to continue learning more about JavaScript.

There’s a (now classic) conference talk that comes with this book, which I can also recommend you watch:

ML Model Degradation, and why work only just starts when you reach production

The assumption that a Machine Learning (ML) project is done when a trained model is put into production is quite faulty. Neverthless, according to Alexandre Gonfalonieri — artificial intelligence (AI) strategist at Philips — this assumption is among the most common mistakes of companies taking their AI products to market.

Actually, in the real world, we see pretty much the opposite of this assumption. People like Alexandre therefore strongly recommend companies keep their best data scientists and engineers on a ML project, especially after it reaches production!

Why?

If you’ve ever productionized a model and really started using it, you know that, over time, your model will start performing worse.

In order to maintain the original accuracy of a ML model which is interacting with real world customers or processes, you will need to continuously monitor and/or tweak it!

In the best case, algorithms are retrained with each new data delivery. This offers a maintenance burden that is not fully automatable. According to Alexandre, tending to machine learning models demands the close scrutiny, critical thinking, and manual effort that only highly trained data scientists can provide.

This means that there’s a higher marginal cost to operating ML products compared to traditional software. Whereas the whole reason we are implementing these products is often to decrease (the) costs (of human labor)!

What causes this?

Your models’ accuracy will often be at its best when it just leaves the training grounds.

Building a model on relevant and available data and coming up with accurate predictions is a great start. However, for how long do you expect those data — that age by the day — continue to provide accurate predictions?

Chances are that each day, the model’s latent performance will go down.

This phenomenon is called concept drift, and is heavily studied in academia but less often considered in business settings. Concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways.

In simpler terms, your model is no longer modelling the outcome that it used to model. This causes problems because the predictions become less accurate as time passes.

Particularly, models of human behavior seem to suffer from this pitfall.

The key is that, unlike a simple calculator, your ML model interacts with the real world. And the data it generates and that reaches it is going to change over time. A key part of any ML project should be predicting how your data is going to change over time.

How do we know when our models fail?

You need to create a monitoring strategy before reaching production!

According to Alexandre, as soon as you feel confident with your project after the proof-of-concept stage, you should start planning a strategy for keeping your models up to date.

How often will you check in?

On the whole model, or just some features?

What features?

In general, sensible model surveillance combined with a well thought out schedule of model checks is crucial to keeping a production model accurate. Prioritizing checks on the key variables and setting up warnings for when a change has taken place will ensure that you are never caught by a surprise by a change to the environment that robs your model of its efficacy.
Alexandre via

Your strategy will strongly differ based on your model and your business context.

Moreover, there are many different types of concept drift that can affect your models, so it should be a key element to think of the right strategy for you specific case!

Different types of model drift (via)

Let’s solve it!

Once you observe degraded model performance, you will need to redesign your model (pipeline).

One solution is referred to as manual learning. Here, we provide the newly gathered data to our model and re-train and re-deploy it just like the first time we build the model. If you think this sounds time-consuming, you are right. Moreover, the tricky part is not refreshing and retraining a model, but rather thinking of new features that might deal with the concept drift.

A second solution could be to weight your data. Some algorithms allow for this very easily. For others you will need to custom build it in yourself. One recommended weighting schema is to use the inversely proportional age of the data. This way, more attention will be paid to the most recent data (higher weight) and less attention to the oldest of data (smaller weight) in your training set. In this sense, if there is drift, your model will pick it up and correct accordingly.

According to Alexandre and many others, the third and best solution is to build your productionized system in such a way that you continuously evaluate and retrain your models. The benefit of such a continuous learning system is that it can be automated to a large extent, thus reducing (the human labor) maintance costs.

Although Alexandre doesn’t expand on how to do these, he does formulate the three steps below:

In my personal experience, if you have your model retrained (automatically) every now and then, using a smart weighting schema, and keep monitoring the changes in the parameters and for several “unit-test” cases, you will come a long way.

If you’re feeling more adventureous, you could improve on matters by having your model perform some exploration (at random or rule-wise) of potential new relationships in your data (see for instance multi-armed bandits). This will definitely take you a long way!

Best practices for writing good, clean JavaScript code

Robert Martin’s book Clean Code has been on my to-read list for months now. Browsing the web, I stumbled across this repository of where Ryan McDermott applied the book’s principles to JavaScript. Basically, he made a guide to producing readable, reusable, and refactorable software code in JavaScript.

Although Ryan’s good and bad code examples are written in JavaScript, the basic principles (i.e. “Uncle Bob”‘s Clean Code principles) are applicable to any programming language. At least, I recognize many of the best practices I’d teach data science students in R or Python.

Find the JavaScript best practices github repo here: github.com/ryanmcdermott/clean-code-javascript

Knowing these won’t immediately make you a better software developer, and working with them for many years doesn’t mean you won’t make mistakes. Every piece of code starts as a first draft, like wet clay getting shaped into its final form. Finally, we chisel away the imperfections when we review it with our peers. Don’t beat yourself up for first drafts that need improvement. Beat up the code instead!
Ryan McDermott via clean-code-javascript