Tag: opensource

Repository of Production Machine Learning

Repository of Production Machine Learning

The Institute for Ethical Machine Learning compiled this amazing curated list of open source libraries that will help you deploy, monitor, version, scale, and secure your production machine learning.

🔍 Explaining predictions & models🔏 Privacy preserving ML📜 Model & data versioning
🏁 Model Training Orchestration💪 Model Serving and Monitoring🤖 Neural Architecture Search
📓 Reproducible Notebooks📊 Visualisation frameworks🔠 Industry-strength NLP
🧵 Data pipelines & ETL🏷️ Data Labelling🗞️ Data storage
📡 Functions as a service🗺️ Computation distribution📥 Model serialisation
🧮 Optimized calculation frameworks💸 Data Stream Processing🔴 Outlier and Anomaly Detection
🌀 Feature engineering🎁 Feature Stores⚔ Adversarial Robustness
💰 Commercial Platforms
Direct links to the sections of the Github repo

The Institute for Ethical Machine Learning is a think-tank that brings together with technology leaders, policymakers & academics to develop standards for ML.

A free, self-taught education in Computer Science!

A free, self-taught education in Computer Science!

The Open Source Society University offers a complete education in computer science using online materials.

They offer a proper introduction to the fundamental concepts for all computing disciplines. Evyerthing form algorithms, logic, and machine learning, up to databases, full stack web development, and graphics is covered. Moreover, you will acquire skills in a variety of languages, including Python, Java, C, C++, Scala, JavaScript, and many more.

According to their GitHub page, the curriculum is suited for people with the discipline, will, and good habits to obtain this education largely on their own, but who’d still like support from a worldwide community of fellow learners.

Curriculum

  • Intro CS: for students to try out CS and see if it’s right for them
  • Core CS: corresponds roughly to the first three years of a computer science curriculum, taking classes that all majors would be required to take
  • Advanced CS: corresponds roughly to the final year of a computer science curriculum, taking electives according to the student’s interests
  • Final Project: a project for students to validate, consolidate, and display their knowledge, to be evaluated by their peers worldwide
  • Pro CS: graduate-level specializations students can elect to take after completing the above curriculum if they want to maximize their chances of getting a good job

It is possible to finish Core CS within about 2 years if you plan carefully and devote roughly 18-22 hours/week to your studies. Courses in Core CS should be taken linearly if possible, but since a perfectly linear progression is rarely possible, each class’s prerequisites are specified so that you can design a logical but non-linear progression based on the class schedules and your own life plans.

Links to the contents

Links to the curriculum (v8.0.0)

Dynamic Programming MIT Course

Dynamic Programming MIT Course

Cover image by xkcd

Over the last months I’ve been working my way through Project Euler in my spare time. I wanted to learn Python programming, and what better way than solving mini-problems and -projects?!

Well, Project Euler got a ton of these, listed in increasing order of difficulty. It starts out simple: to solve the first problem you need to write a program to identify multiples of 3 and 5. Next, in problem two, you are asked to sum the first thousand even Fibonacci numbers. Each problem, the task at hand gets slighly more difficult…

For me, Project Euler combines math, programming, and stats in a way that really keeps me motivated to continue and learn new concepts and programming / problem-solving approaches.

However, at problem 31, I really got stuck. For several hours, I struggled to solve it in a satisfactory fashion, even though most other problems only take 5-90 minutes.

After hours of struggling, I pretty much gave up, and googled some potential solutions. Aparently, the way to solve problem 31, is to take a so-called dynamic programming approach.

Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively. Likewise, in computer science, if a problem can be solved optimally by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems, then it is said to have optimal substructure.

https://en.wikipedia.org/wiki/Dynamic_programming

Now, this sounded like something I’d like to learn more about! I was already quite familiar with recursive problems and solutions, but this dynamic programming sounded next-level.

So I googled and googled for tutorials and other resources, and I finally came across this free 2011 MIT course that I intend to view over the coming weeks.

There’s even a course website with additional materials and assignments (in Python).

ASSN #TOPICSPROBLEM SETSSOLUTIONS
1Asymptotic complexity, recurrence relations, peak findingProblem Set 1 (PDF)
Problem Set 1 Code (ZIP)
Problem Set 1 Solutions (PDF)
2Fractal rendering, digital circuit simulationProblem Set 2 (PDF)
Problem Set 2 Code (ZIP)
Problem Set 2 Solutions (PDF)
Problem Set 2 Code Solutions (ZIP – 7.7MB)
3Range queries, digital circuit layoutProblem Set 3 (PDF)
Problem Set 3 Code (ZIP – 3.2MB)
Problem Set 3 Solutions (PDF)
Problem Set 3 Code Solutions (ZIP – 15.7MB)
4Hash functions, Python dictionaries, matching DNA sequencesProblem Set 4 (PDF)
Problem Set 4 Code (GZ – 12.4MB) (kfasta.py courtesy of Kevin Kelley, and used with permission.)
Problem Set 4 Solutions (PDF)
Problem Set 4 Code Solutions (ZIP)
5The Knight’s Shield, RSA public key encryption, image decryptionProblem Set 5 (PDF)
Problem Set 5 Code (ZIP)
Problem Set 5 Grading Explanation (PDF)
Problem Set 5 Solutions (PDF)
Problem Set 5 Code Solutions (ZIP)
6Social networks, Rubik’s Cube, DijkstraProblem Set 6 (PDF)
Problem Set 6 Code (ZIP – 2.9MB) (nhpn.py courtesy of Punyashloka Biswal and Michael Lieberman; Pocket Cube Solver courtesy of Huan Liu and Anh Nguyen. Used with permission.)
Problem Set 6 Solutions (PDF)
Problem Set 6 Code Solutions (ZIP)
7Seam carving, stock purchasing and knapsackProblem Set 7 (PDF)
Seam Carving for Content-Aware Image ResizingProblem Set 7 Code (ZIP) (Sunset image © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/fairuse.)Problem Set 7 Answer Template (ZIP)Problem Set 7 Grading Explanation (PDF)
Problem Set 7 Solutions (PDF)
Problem Set 7 Code Solutions (ZIP)

Will you join me? And let me know what you think!

For those less interested in (dynamic) programming but mostly in machine learning, there’s this other great MIT OpenCourseWare youtube playlist of their Artificial Intelligence course. I absolutely loved that course and I really powered through it in a matter of weeks (which is why I am already psyched about this new one). I learned so much new concepts, and I strongly recommend it. Unfortunately, the professor recently passed away.

Google Fonts: 915 free font families

Google Fonts: 915 free font families

Looking for a custom typeface to use in your data visualizations? Google Fonts is an awesome databank of nearly a thousands font families you can access, download, and use for free.

If you’re into design, the website includes a blog featuring articles on font design.

Google Fonts among others provided the font for my dissertation cover so I definitely recommend it.

Circular Map Cutouts in R

Circular Map Cutouts in R

Katie Jolly wanted to surprise a friend with a nice geeky gift: a custom-made map cutout. Using R and some visual finetuning in Inkscape, she was able to made the below.

A detailed write-up of how Katie got to this product is posted here.

Basically, the R’s tigris package included all data on roads, and the ArcGIS Open Data Hub provided the neighborhood boundaries. Fortunately, the sf package is great for transforming and manipulating geospatial data, and includes some functions to retrieve a subset of roads based on their distance to a centroid. With this subset, Katie could then build these wonderful plots in no time with ggplot2.

Papers with Code: State-of-the-Art

Papers with Code: State-of-the-Art

OK, this is a really great find!

The website PapersWithCode.com lists all scientific publications of which the codes are open-sourced on GitHub. Moreover, you can sort these papers by the stars they accumulated on Github over the past days.

The authors, @rbstojnic and @rosstaylor90, just made this in their spare time. Thank you, sirs!

Papers with Code allows you to quickly browse state-of-the-art research on GANs and the code behind them, for instance. Alternatively, you can browse for research and code on sentiment analysis or LSTMs