Tag: math

Learn Julia for Data Science

Most data scientists favor Python as a programming language these days. However, there’s also still a large group of data scientists coming from a statistics, econometrics, or social science and therefore favoring R, the programming language they learned in university. Now there’s a new kid on the block: Julia.

Image result for julia programming" — Via Medium

Advantages & Disadvantages

According to some, you can think of Julia as a mixture of R and Python, but faster. As a programming language for data science, Julia has some major advantages:

Julia is light-weight and efficient and will run on the tiniest of computers
Julia is just-in-time (JIT) compiled, and can approach or match the speed of C
Julia is a functional language at its core
Julia support metaprogramming: Julia programs can generate other Julia programs
Julia has a math-friendly syntax
Julia has refined parallelization compared to other data science languages
Julia can call C, Fortran, Python or R packages

However, others also argue that Julia comes with some disadvantages for data science, like data frame printing, 1-indexing, and its external package management.

Comparing Julia to Python and R

Open Risk Manual published this side-by-side review of the main open source Data Science languages: Julia, Python, R.

You can click the links below to jump directly to the section you’re interested in. Once there, you can compare the packages and functions that allow you to perform Data Science tasks in the three languages.

General	Development	Algorithms & Datascience
History and Community	Development Environment	General Purpose Mathematical Libraries
Devices and Operating Systems	Files, Databases and Data Manipulation	Core Statistics Libraries
Package Management	Web, Desktop and Mobile Deployment	Econometrics / Timeseries Libraries
Package Documentation	Semantic Web / Semantic Data	Machine Learning Libraries
Language Characteristics	High Performance Computing	GeoSpatial Libraries
	Using R, Python and Julia together	Visualization

Via openriskmanual.org/wiki/Overview_of_the_Julia-Python-R_Universe

Starting with Julia for Data Science

Here’s a very well written Medium article that guides you through installing Julia and starting with some simple Data Science tasks. At least, Julia’s plots look like:

Bayes theorem, and making probability intuitive – by 3Blue1Brown

This video I’ve been meaning to watch for a while now. It another great visual explanation of a statistics topic by the 3Blue1Brown Youtube channel (which I’ve covered before, multiple times).

This time, it’s all about Bayes theorem, and I just love how Grant Sanderson explains the concept so visually. He argues that rather then memorizing the theorem, we’d rather learn how to draw out the context. Have a look at the video, or read my summary below:

Grant Sanderson explains the concept very visually following an example outlined in Daniel Kahneman’s and Amos Tversky’s book Thinking Fast, Thinking Slow:

Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.”
Is Steve more likely to be a librarian or a farmer?
Question from Thinking Fast, Thinking Slow

What was your first guess?

Kahneman and Tversky argue that people take into account Steve’s disposition and therefore lean towards librarians.

However, few people take into account that librarians are quite scarce in our society, which is rich with farmers. For every librarian, there are 20+ farmers. Hence, despite the disposition, Steve is probably more like to be a farmer.

https://www.youtube.com/watch?v=HZGCoVF3YvM&feature=youtu.be

Rather than remembering the upper theorem, Grant argues that it’s often easier to just draw out the rectangle of probabilities below.

Try it out for yourself using another example by Kahneman and Tversky:

Turning the Traveling Salesman problem into Art

Robert Bosch is a professor of Natural Science at the department of Mathematics of Oberlin College and has found a creative way to elevate the travelling salesman problem to an art form.

For those who aren’t familiar with the travelling salesman problem (wiki), it is a classic algorithmic problem in the field of computer science and operations research. Basically, we want are looking for a mathematical solution that is cheapest, shortest, or fastest for a given problem. Most commonly, it is seen as a graph (network) describing the locations of a set of nodes (elements in that network). Wikipedia has a description I can’t improve on:

The Travelling Salesman Problem describes a salesman who must travel between N cities. The order in which he does so is something he does not care about, as long as he visits each once during his trip, and finishes where he was at first. Each city is connected to other close by cities, or nodes, by airplanes, or by road or railway. Each of those links between the cities has one or more weights (or the cost) attached. The cost describes how “difficult” it is to traverse this edge on the graph, and may be given, for example, by the cost of an airplane ticket or train ticket, or perhaps by the length of the edge, or time required to complete the traversal. The salesman wants to keep both the travel costs, as well as the distance he travels as low as possible.
Wikipedia

Here’s a visual representation of the problem and some algorithmic approaches to solving it:

Now, Robert Bosch has applied the traveling salesman problem to well-know art pieces, trying to redraw them by connecting a series of points with one continuous line. Robert even turned it into a challenge so people can test out how well their travelling salesman algorithms perform on, for instance, the Mona Lisa, or Vincent van Gogh.

Just look at the detail on these awesome Dutch classics:

Read more about this awesome project here: http://www.math.uwaterloo.ca/tsp/data/art/

P.S. Why do Brits and Americans have this spelling feud?! As a non-native, I never know what to pick. Should I write modelling or modeling, travelling or traveling, tomato or tomato? I got taught the U.K. style, but the U.S. style pops up whenever I google stuff, so I am constantly confused! Now I subconciously intertwine both styles in a single text…

Helpful resources for A/B testing

Brandon Rohrer — (former) data scientist at Microsoft, iRobot, and Facebook — asked his network on Twitter and LinkedIn to share their favorite resources on A/B testing. It produced a nice list, which I summarized below.

Hey Twitter, a contact just asked me about A/B testing. Do you have any posts or tutorials you would recommend for them?
— Brandon Rohrer (@_brohrer_) July 6, 2019

The order is somewhat arbitrary, and somewhat based on my personal appreciation of the resources.

Course: A/B-testing by Google via Udacity
Game: So You Think You Can Test? by Lukas Vermeer
Video: A/B Testing in the Wild by Emily Robinson
Video: Beyond Two Groups: Generalized Bayesian A/B[/C/D/E…] Testing by Eric Ma via PyCon 2019
Book: Algorithms to Live By by Brian Christian and Tom Griffiths
Blog: Why Multi-armed Bandit algorithms are superior to A/B testing by Chris Stucchio (see other materials)
Blog: Bayesian Bandits – optimizing click throughs with statistics by Chris Stucchio (see other materials)
Blog: 12 Guidelines for A/B Testing by Emily Robinson (summary).
Blog: A/B Testing Mastery: From Beginner to Pro in a Blog Post by Alex Birkett via ConversionXL
Blog: What is A/B Testing? How to Use A/B Testing to Improve Conversions by MailChimp
Blog: Data Science you need to know! A/B testing by Michael Barber via Medium
Blog: Detecting Interference: An A/B Test of A/B Tests by Guillaume Saint-Jacques
Wiki: A/B Testing
Blog: The Math Behind A/B Testing by Amazon
Blog: How Not To Run an A/B Test by Evan Miller
Blog: A/B Testing by Optimezely
Blog: 5 Things to Know About A/B Testing by Matthew Mayo via KDnuggets
Blog: A Marketer’s Guide to A/B Testing by CleverTap
Blog: A Beginner’s Guide To A/B Testing: An Introduction by Neil Patel

Cover image via Optimizely

Machine Learning & Deep Learning book

The Deep Learning textbook helps students and practitioners enter the field of machine learning in general and deep learning in particular. Its online version is available online for free whereas a hardcover copy can be ordered here on Amazon. You can click on the topics below to be redirected to the book chapter: