Wordle with NLP for Data Scientists

Wordle with NLP for Data Scientists

I have played my fair share of Wordle.

I’m not necessarily good at it, but most days I get to solve the puzzle.

The experience is completely different with Semantle — a Wordle-inspired puzzle in which you also need to guess the word of the day.

Unlike in Wordle, Semantle gives you unlimited guesses though. And, boy, you will need many!

Like Wordle, Semantle gives you hints as to how close your guesses were to the secret word of the day.

However, where Wordle shows you how good your guesses were in terms of the letters used, Semantle evaluates the semantic similarity of your guesses to the secret word. For the 1000 most similar words to the secret word, it will show you its closeness like in the picture above.

This semantic similarity comes from the domain of Natural Language Processing NLP — and this basically reflects how often words are used in similar contexts in natural language.

For instance, the words “love” and “hate” may seem like opposites, but they will often score similarly in grammatical sentences. According to the semantle FAQ the actual opposite of “love” is probably something like “Arizona Diamondbacks”, or “carburetor”.

Another example is last day’s solution (15 March 2022), when the secret word was circle. The ten closest words you could have guessed include circles and semicircle, but more distinctive words such as corner and clockwise.

Further downfield you could have guessed relatively close words like saucer, dot, parabola, but I would not have expected words like outwaited, weaved, and zipped.

The creator of Semantle scored the semantic similarity for almost all words used in the English language, by training a so-called word2vec model based on a very large dataset of news articles (GoogleNews-vectors-negative300.bin from late 2021).

Now, every day, one word is randomly selected as the secret word, and you can try to guess which one it is. I usually give up after 300 to 400 guesses, but my record was 76 guesses for uncovering the secret word world.

Try it out yourself: https://semantle.novalis.org/

And do share your epic wins and fails!

Best Charts for Income & Profit & Loss Statements

Best Charts for Income & Profit & Loss Statements

A few months back I wrote about how Rackspace confuses their shareholders using bad data visualization in their quarterly reports.

Mort Goldman — one of my dear readers — pointed me to this great tutorial by Kamil Franek where he shows 7 ways to visualize income and profit and loss statements. Please visit Kamil’s blog for the details, I just copied the visuals here to share with you.

Maybe we should forward them to Rackspace as well 😉

Kamil uses Google/Alphabet’s 2018 financial reports as data for his examples.

Here are two Sankey diagrams, with different levels of detail. Kamil argues they work best for the big picture overview.

Example of summarized Sankey diagram chart of an income statement
Example of detailed income statement Sankey diagram visualization

I dislike how most text 90 degrees rotated, forcing me to tilt my head in order to read it.

An alternative Kamil proposes is the well-known Waterfall chart. Kamil dedicated a whole blog post to creating good waterfalls.

Example of detailed income statement waterfall chart

One of my favorite visualization of the blog were these two combined bar charts. One showing the whole bars stacked, the other showing them seperately. The stacked one allows you to discern the bigger trend. The small ones allow for within category comparison.

Love it!

Not so much a fan of the next stacked area chart though. In my opinion, a lot of ink for very little information displayed.

Example of  percentage revenue breakdown area chart

The colors in this next one are lovely though:

Example of percentage expenses  breakdown area chart
The next scatter plot/bubble plot was one that I had not expected.

I love how this unorthodox visualization really add insights, showing how different cost categories have developed over time.

There are some things I would tweak to make the graph more visually appealing though. Particularly the benchmark line is too rough in my opinion.

Example of expenses changes breakdown scatter/bubble plot

Very often, you don’t need a specialized graph, but a well-formatted table might be much more effective.

Kamil shows two great examples. The first one with an integrated bar chart/sparkline, the second one relying strongly on color cues. I prefer the second one, as it better shows the hierarchy in the categories with the highlighted rows.

Example of income statement table with sparklines
Example of income statement table with conditional formatting

Kamil takes it a step further in the next table, but I think they become less and less insightful as more information is included:

Example of a detailed income statement table for change analysis
Kamil’s final recommendation is this key metrics dashboard. Though I like the general idea, I am not sure whether this one works for me. Particularly the line graphs on the right don’t provide much insight. I don’t know whether the last but one dot is 20% or 5% or 50% or 0%. The lack of reference points allows it to be any of these values.
Example of a summary dashboard for income statement key metrics

If you haven’t yet clicked through, definitely check out Kamil’s original post.

There he shares his perspective on the advantages and disadvantages of each of these visualization types, and where they work best in his experience.

Also check out Kamil’s earlier post on How to Visually Redesign Your Income Statement (P&L).

Practice your Data Science skills on real-life data

Practice your Data Science skills on real-life data

Maven Analytics now provides open access to their datasets through what they call their DATA PLAYGROUND!

They offer 21 datasets including a range of different data (time series, geospatial, user preferences) on a variety of topics like business, sports, wine, financial stocks, transportation and whatnot.

This is a great starting point if you want to practice your data science, machine learning and analysis skills on real life data!

Maven Analytics provides e-learnings in analysis and programming software. To provide a practical learning experience, their courses are often accompanied by real-life datasets for students to analyze.

Vox: Are We Automating Racism?

Vox: Are We Automating Racism?

In Glad You Asked, Vox dives deep into timely questions around the impact of systemic racism on our communities and in our daily lives.

In this video, they look into the role of tech in societal discrimination. People assume that tech and data are neutral, and we have turned to tech as a way to replace biased human decision-making. But as data-driven systems become a bigger and bigger part of our lives, we see more and more cases where they fail. And, more importantly, that they don’t fail on everyone equally.

Why do we think tech is neutral? How do algorithms become biased? And how can we fix these algorithms before they cause harm? Find out in this mini-doc:

A New Piece in my Algorithmic Art Collection

A New Piece in my Algorithmic Art Collection

Those who have been following me for some time now will know that I am a big fan of generative art: art created through computers, mathematics, and algorithms.

Several years back, my now wife bought me my first piece for my promotion, by Marcus Volz.

And several years after that, I made my own attempt at a second generative art piece, again inspired by the work of Marcus on what he dubbed Metropolis.

Now, our living room got a third addition in terms of the generative art, this time by Nicholas Rougeux.

Nicholas I bumped into on twitter, triggered by his collection of “Lunar Landscapes” (my own interpretation).

Nicholas was hesistant to sell me a piece and insisted that this series was not finished yet.

Yet, I already found it wonderful and lovely to look at and after begging Nicholas to sell us one of his early pieces, I sent it over to ixxi to have it printed and hanged it on our wall above our dinner table.

If you’re interested in Nicholas’ work, have a look at c82.net

Shopify Party: Race your colleagues virtually

Shopify Party: Race your colleagues virtually

There are many tools to connect virtually with your coworkers. Think of Teams, Zoom, Google Meet, or Slack. And during the recent pandemic, we have seen their usage surge. Yet, most of these tools try to recreating the office experience using video conference calls.

At ShopifyDaniel Beauchamp and his team took a different approach. They created SHOPIFY PARTY: a fullblown virtual world designed for social play and hanging out.

Here, Shopify employees can now play games during their 1:1s, standups, and other team events. They can hold boat races, log jumps, dance contests, exploration hikes, or just chill with their coworkers by a virtual campfire.

This must provide an incredible boost for the employee experience, well-being, and for forming workplace relationships in general!

The virtual environment is created in Unity and runs right in the webbrowser through use of WebGL.