publicdata – paulvanderlaken.com

Max Woolf writes machine learning blogs on his personal blog, minimaxir, and posts open-source code repositories on his GitHub. He is a former Apple Software QA Engineer and graduated from Carnegie Mellon University. I have published his work before, for instance, this short ggplot2 tutorial by MiniMaxir, but his new project really amazed me.

Max developed a Facebook web scaper in Python. This tool gathers all the posts and comments of Facebook Pages (or Open Facebook Groups) and the related metadata, including post message, post links, and counts of each reaction on the post. The data is then exported to a CSV file, which can be imported into any data analysis program like Excel, or R.

The data format returned by the Facebook scaper.

Max put his scraper to work and gathered a ton of publicly available Facebook posts and their metadata between 2016 and 2017.

However, this was only the beginning. In a follow-up project, Max trained a recurrent neural network (or RNN) on these 2016-2017 data in order to predict the proportionate reactions (love, wow, haha, sad, angry) to any given text. Now, he has made this neural network publicly available with the Python 2/3 module and R package, reactionrnn, which builds on Keras/TensorFlow (see Keras: Deep Learning in R or Python within 30 seconds & R learning: Neural Networks).

Python implementation

For Python, reactionrnn can be installed from pypi via pip:

python3 -m pip install reactionrnn

You may need to create a venv (python3 -m venv <path>) first.

from reactionrnn import reactionrnn

react = reactionrnn()
react.predict("Happy Mother's Day from the Chicago Cubs!")

[('love', 0.9765), ('wow', 0.0235), ('haha', 0.0), ('sad', 0.0), ('angry', 0.0)]

R implementation

For R, you can install reactionrnn from this GitHub repo with devtools (working on resolving issues to get package on CRAN):

# install.packages('devtools')
devtools::install_github("minimaxir/reactionrnn", subdir="R-package")

library(reactionrnn)
react <- reactionrnn()
react %>% predict("Happy Mother's Day from the Chicago Cubs!")

      love        wow       haha        sad      angry 
0.97649449 0.02350551 0.00000000 0.00000000 0.00000000

You can view a demo of common features in this Jupyter Notebook for Python, and this R Notebook for R.

Notes

reactionrnn is trained on Facebook posts of 2016 and 2017 and will often yield responses that are characteristic for this corpus.
reactionrnn will only use the first 140 characters of any given text.
Max intends to build a web-based implementation using Keras.js
Max also intends to improve the network (longer character sequences and better performance) and released it as a commercial product if any venture capitalists are interested.
Max’s projects are open-source and supported by his Patreon, any monetary contributions are appreciated and will be put to good creative use.