In 2016, Saul Pwanson designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands of crosswords from various sources.
After putting all these crosswords in a simple uniform format, Saul used some simple command line commands to check for common patterns and irregularities.
Surprisingly enough, after visualizing the results, Saul discovered egregious plagiarism by a major crossword editor that had gone on for years.
I thoroughly enjoyed watching this talk on Youtube.
Saul covers the file format, data pipeline, and the design choices that aided rapid exploration; the evidence for the scandal, from the initial anomalies to the final damning visualization; and what it’s like for a data project to get 15 minutes of fame.
I tried to localize the dataset online, but it seems Saul’s website has since gone offline. If you do happen to find it, please do share it in the comments!
Suppose you operate a warehouse where workers work 11-hour shifts. In order to meet your productivity KPIs, a significant number of them need to take painkillers multiple times per shift. Do you…
Decrease or change the KPI (goals)
Make shifts shorter
Increase the number or duration of breaks
Increase the medical staff
Install vending machines to dispense painkillers more efficiently
Nobody in their right mind would take option 5… Right?
Yet, this is precisely what Amazon did according to Emily Guendelsberger in her insanely interesting and relevant book “On the clock” (note the paradoxal link to Amazon’s webshop here).
Emily went undercover as employee at several organizations to experience blue collar jobs first-hand. In her book, she discusses how tech and data have changed low-wage jobs in ways that are simply dehumanizing.
These days, with sensors, timers, and smart nudging, employees are constantly being monitored and continue working (hard), sometimes at the cost of their own health and well-being.
I really enjoyed the book, despite the harsh picture it sketches of low wage jobs and malicious working conditions these days. The book poses several dilemma’s and asks multiple reflective questions that made me re-evaluate and re-appreciate my own job. Truly an interesting read!
Some quotes from the book to get you excited:
“As more and more skill is stripped out of a job, the cost of turnover falls; eventually, training an ever-churning influx of new unskilled workers becomes less expensive than incentivizing people to stay by improving the experience of work or paying more.”
Emily Guendelsberger, On the Clock
“Q: Your customer-service representatives handle roughly sixty calls in an eighty-hour shift, with a half-hour lunch and two fifteen-minute breaks. By the end of the day, a problematic number of them are so exhausted by these interactions that their ability to focus, read basic conversational cues, and maintain a peppy demeanor is negatively affected. Do you:
A. Increase staffing so you can scale back the number of calls each rep takes per shift — clearly, workers are at their cognitive limits
B. Allow workers to take a few minutes to decompress after difficult calls
C. Increase the number or duration of breaks
D. Decrease the number of objectives workers have for each call so they aren’t as mentally and emotionally taxing
E. Install a program that badgers workers with corrective pop-ups telling them that they sound tired.
Seriously—what kind of fucking sociopath goes with E?”
Note: these are affiliate links. If you buy a similar setup, it will generate a few bucks used to keep my website live!
My setup totalled to about €1100 or $1200, but it may depend on the vendors you pick. Nonetheless, the CPU and the GPU are definitely the most expensive (and important).
I did not buy any additional fans, as the Be Quiet base already had some pre-installed. However, I think it might be better to install extra’s.
Actually, it’s very easy to upgrade (or downgrade) your system. You can easily switch out modules to decrease or increase the performance (and cost). For instance, you can install another two memory cards on your motherboard, or simply spend more on a GPU.
After everything was delivered to my house, I thought the hard part started: building the desktop and putting everything together. But actually, this only took me about an hour or two, with the help of some great tutorials on Youtube:
I hope this convinces and helps you to build your own system at home!
If you are looking for a project to build a bot or AI application, look no further.
Enter the stage, PyBoy, a Nintendo Game Boy (DMG-01 ) written in Python 2.7. The implementation runs in almost pure Python, but with dependencies for drawing graphics and getting user interactions through SDL2 and NumPy.
PyBoy is great for your AI robot projects as it is loadable as an object in Python. This means, it can be initialized from another script, and be controlled and probed by the script. You can even use multiple emulators at the same time, just instantiate the class multiple times.
I love how people are using data and data science to fight fake news these days (see also Identifying Dirty Twitter Bots), and I recently came across another great example.
Conspirador Norteño (real name unkown) is a member of what they call #TheResistance. It’s a group of data scientists discovering and analyzing so-called botnets – networks of artificial accounts on social media websites, like Twitter.
TheResistance uses quantitative analysis to unveil large groups of fake accounts, spreading potential fake news, or fake-endorsing the (fake) news spread by others.
They looked at the date of these accounts started following Shiva, offset by the date of their accounts’ creation. A remarkeable pattern appeared:
Although @va_shiva‘s recent followers look unremarkable, a significant majority of his first 5000 followers appear to have been created in batches and to have subsequently followed @va_shiva in rapid succession.
Looking at those followers in more detail, other suspicious patterns emerge. Their names follow a same pattern, they have an about equal amount of followers, followings, tweets, and (no) likes. Moreover, they were created only seconds apart. Many of them seem to follow each other as well.
If that wasn’t enough proof of something’s off, here’s a variety of their tweets… Not really what everyday folks would tweet right? Plus similar patterns again across acounts.
At first, I thought, so what? This Shiva guy probably just set up some automated (Python?) scripts to make Twitter account and follow him. Good for him. It worked out, as his most recent 10k followers followed him organically.
However, it becomes more scary if you notice this Shiva guy is (succesfully) promoting the firing of people working for the government:
Anyways, wanted to share this simple though cool approach to finding bots & fake news networks on social media. I hope you liked it, and would love to hear your thoughts in the comments!