Tag: computing

A free, self-taught education in Computer Science!

A free, self-taught education in Computer Science!

The Open Source Society University offers a complete education in computer science using online materials.

They offer a proper introduction to the fundamental concepts for all computing disciplines. Evyerthing form algorithms, logic, and machine learning, up to databases, full stack web development, and graphics is covered. Moreover, you will acquire skills in a variety of languages, including Python, Java, C, C++, Scala, JavaScript, and many more.

According to their GitHub page, the curriculum is suited for people with the discipline, will, and good habits to obtain this education largely on their own, but who’d still like support from a worldwide community of fellow learners.

Curriculum

  • Intro CS: for students to try out CS and see if it’s right for them
  • Core CS: corresponds roughly to the first three years of a computer science curriculum, taking classes that all majors would be required to take
  • Advanced CS: corresponds roughly to the final year of a computer science curriculum, taking electives according to the student’s interests
  • Final Project: a project for students to validate, consolidate, and display their knowledge, to be evaluated by their peers worldwide
  • Pro CS: graduate-level specializations students can elect to take after completing the above curriculum if they want to maximize their chances of getting a good job

It is possible to finish Core CS within about 2 years if you plan carefully and devote roughly 18-22 hours/week to your studies. Courses in Core CS should be taken linearly if possible, but since a perfectly linear progression is rarely possible, each class’s prerequisites are specified so that you can design a logical but non-linear progression based on the class schedules and your own life plans.

Links to the contents

Links to the curriculum (v8.0.0)

Learn Julia for Data Science

Learn Julia for Data Science

Most data scientists favor Python as a programming language these days. However, there’s also still a large group of data scientists coming from a statistics, econometrics, or social science and therefore favoring R, the programming language they learned in university. Now there’s a new kid on the block: Julia.

Image result for julia programming"
Via Medium

Advantages & Disadvantages

According to some, you can think of Julia as a mixture of R and Python, but faster. As a programming language for data science, Julia has some major advantages:

  1. Julia is light-weight and efficient and will run on the tiniest of computers
  2. Julia is just-in-time (JIT) compiled, and can approach or match the speed of C
  3. Julia is a functional language at its core
  4. Julia support metaprogramming: Julia programs can generate other Julia programs
  5. Julia has a math-friendly syntax
  6. Julia has refined parallelization compared to other data science languages
  7. Julia can call C, Fortran, Python or R packages

However, others also argue that Julia comes with some disadvantages for data science, like data frame printing, 1-indexing, and its external package management.

Comparing Julia to Python and R

Open Risk Manual published this side-by-side review of the main open source Data Science languages: Julia, Python, R.

You can click the links below to jump directly to the section you’re interested in. Once there, you can compare the packages and functions that allow you to perform Data Science tasks in the three languages.

GeneralDevelopmentAlgorithms & Datascience
History and CommunityDevelopment EnvironmentGeneral Purpose Mathematical Libraries
Devices and Operating SystemsFiles, Databases and Data ManipulationCore Statistics Libraries
Package ManagementWeb, Desktop and Mobile DeploymentEconometrics / Timeseries Libraries
Package DocumentationSemantic Web / Semantic DataMachine Learning Libraries
Language CharacteristicsHigh Performance ComputingGeoSpatial Libraries
Using R, Python and Julia togetherVisualization
Via openriskmanual.org/wiki/Overview_of_the_Julia-Python-R_Universe

Starting with Julia for Data Science

Here’s a very well written Medium article that guides you through installing Julia and starting with some simple Data Science tasks. At least, Julia’s plots look like:

Via Medium
History of the Modern Python Dictionary – by Raymond Hettinger

History of the Modern Python Dictionary – by Raymond Hettinger

Raymond Hettinger is one of the core Python developers whose talks I’ve featured on my blog before. And rightfully so, as Raymond’s presentations are unarguably entertaining and deeply insightful from an technical perspective.

In this recorded talk at the 2016 Annual Holiday Party for Python Devs in San Fransisco Bay Area, Raymond walks us through the history and development of dictionaries and hash tables uses example code in Python.

Python’s dictionaries are stunningly good. Over the years, many great ideas have combined together to produce the modern implementation in Python 3.6. This fun talk is given by Raymond Hettinger, the Python core developer responsible for the set implementation and who designed the compact-and-ordered dict implemented in CPython for Python 3.6 and in PyPy for Python 2.7. He will use pictures and little bits of pure python code to explain all of the key ideas and how they evolved over time. He will also include newer features such as key-sharing, compaction, and versioning. This talk is important because it is the only public discussion of the state of the art as of Python 3.6. Even experienced Python users are unlikely to know the most recent innovations.

This talk is for all Python programmers. It is designed to be fully understandable for a beginner (it starts from first principles) but to have new information even for Python experts (how key-sharing works, how the compact-ordered patch works, how dict versioning works). At the end of this talk, you can confidently say that you know how modern Python dictionaries work and what it means for your code.

https://www.youtube.com/watch?v=p33CVV29OG8
Sorting Algorithms 101: Visualized

Sorting Algorithms 101: Visualized

Sorting is one of the central topic in most Computer Science degrees. In general, sorting refers to the process of rearranging data according to a defined pattern with the end goal of transforming the original unsorted sequence into a sorted sequence. It lies at the heart of successful businesses ventures — such as Google and Amazon — but is also present in many applications we use daily — such as Excel or Facebook.

Many different algorithms have been developed to sort data. Wikipedia lists as many as 45 and there are probably many more. Some work by exchanging data points in a sequence, others insert and/or merge parts of the sequence. More importantly, some algorithms are quite effective in terms of the time they take to sort data — taking only n time to sort n datapoints — whereas others are very slow — taking as much as n^2. Moreover, some algorithms are stable — in the sense that they always take the same amount of time to process n datapoints — whereas others may fluctuate in terms of processing time based on the original order of the data.

I really enjoyed this video by TED-Ed on how to best sort your book collection. It provides a very intuitive introduction into sorting strategies (i.e., algorithms). Moreover, Algorithms to Live By (Christian & Griffiths, 2016) provided the amazing suggestion to get friends and pizza in whenever you need to sort something, next to the great explanation of various algorithms and their computational demand.

The main reason for this blog is that I stumbled across some nice video’s and GIFs of sorting algorithms in action. These visualizations are not only wonderfully intriguing to look at, but also help so much in understanding how the sorting algorithms process the data under the hood. You might want to start with the 4-minute YouTube video below, demonstrating how nine different sorting algorithms (Selection Sort, Shell Sort, Insertion Sort, Merge Sort, Quick Sort, Heap Sort, Bubble Sort, Comb Sort, & Cocktail Sort) process a variety of datasets.

This interactive website toptal.com allows you to play around with the most well-known sorting algorithms, putting them to work on different datasets. For the grande finale, I found these GIFs and short video’s of several sorting algorithms on imgur. In the visualizations below, each row of the image represents an independent list being sorted. You can see that Bubble Sort is quite slow:

Cocktail Shaker Sort already seems somewhat faster, but still takes quite a while.

For some algorithms, the visualization clearly shows that the settings you pick matter. For instance, Heap Sort is much quicker if you choose to shift down instead of up.

In contrast, for Merge Sort it doesn’t matter whether you sort by breadth first or depth first.

The imgur overview includes many more visualized sorting algorithms but I don’t want to overload WordPress or your computer, so I’ll leave you with two types of Radix Sort, the rest you can look up yourself!