Tag: geospatial

Learn Julia for Data Science

Learn Julia for Data Science

Most data scientists favor Python as a programming language these days. However, there’s also still a large group of data scientists coming from a statistics, econometrics, or social science and therefore favoring R, the programming language they learned in university. Now there’s a new kid on the block: Julia.

Image result for julia programming"
Via Medium

Advantages & Disadvantages

According to some, you can think of Julia as a mixture of R and Python, but faster. As a programming language for data science, Julia has some major advantages:

  1. Julia is light-weight and efficient and will run on the tiniest of computers
  2. Julia is just-in-time (JIT) compiled, and can approach or match the speed of C
  3. Julia is a functional language at its core
  4. Julia support metaprogramming: Julia programs can generate other Julia programs
  5. Julia has a math-friendly syntax
  6. Julia has refined parallelization compared to other data science languages
  7. Julia can call C, Fortran, Python or R packages

However, others also argue that Julia comes with some disadvantages for data science, like data frame printing, 1-indexing, and its external package management.

Comparing Julia to Python and R

Open Risk Manual published this side-by-side review of the main open source Data Science languages: Julia, Python, R.

You can click the links below to jump directly to the section you’re interested in. Once there, you can compare the packages and functions that allow you to perform Data Science tasks in the three languages.

GeneralDevelopmentAlgorithms & Datascience
History and CommunityDevelopment EnvironmentGeneral Purpose Mathematical Libraries
Devices and Operating SystemsFiles, Databases and Data ManipulationCore Statistics Libraries
Package ManagementWeb, Desktop and Mobile DeploymentEconometrics / Timeseries Libraries
Package DocumentationSemantic Web / Semantic DataMachine Learning Libraries
Language CharacteristicsHigh Performance ComputingGeoSpatial Libraries
Using R, Python and Julia togetherVisualization
Via openriskmanual.org/wiki/Overview_of_the_Julia-Python-R_Universe

Starting with Julia for Data Science

Here’s a very well written Medium article that guides you through installing Julia and starting with some simple Data Science tasks. At least, Julia’s plots look like:

Via Medium
Circular Map Cutouts in R

Circular Map Cutouts in R

Katie Jolly wanted to surprise a friend with a nice geeky gift: a custom-made map cutout. Using R and some visual finetuning in Inkscape, she was able to made the below.

A detailed write-up of how Katie got to this product is posted here.

Basically, the R’s tigris package included all data on roads, and the ArcGIS Open Data Hub provided the neighborhood boundaries. Fortunately, the sf package is great for transforming and manipulating geospatial data, and includes some functions to retrieve a subset of roads based on their distance to a centroid. With this subset, Katie could then build these wonderful plots in no time with ggplot2.

A Categorical Spatial Interpolation Tutorial in R

A Categorical Spatial Interpolation Tutorial in R

Timo Grossenbacher works as reporter/coder for SRF Data, the data journalism unit of Swiss Radio and TV. He analyzes and visualizes data and investigates data-driven stories. On his website, he hosts a growing list of cool projects. One of his recent blogs covers categorical spatial interpolation in R. The end result of that blog looks amazing:

This map was built with data Timo crowdsourced for one of his projects. With this data, Timo took the following steps, which are covered in his tutorial:

  • Read in the data, first the geometries (Germany political boundaries), then the point data upon which the interpolation will be based on.
  • Preprocess the data (simplify geometries, convert CSV point data into an sf object, reproject the geodata into the ETRS CRS, clip the point data to Germany, so data outside of Germany is discarded).
  • Then, a regular grid (a raster without “data”) is created. Each grid point in this raster will later be interpolated from the point data.
  • Run the spatial interpolation with the kknn package. Since this is quite computationally and memory intensive, the resulting raster is split up into 20 batches, and each batch is computed by a single CPU core in parallel.
  • Visualize the resulting raster with ggplot2.

All code for the above process can be accessed on Timo’s Github. The georeferenced points underlying the interpolation look like the below, where each point represents the location of a person who selected a certain pronunciation in an online survey. More details on the crowdsourced pronunciation project van be found here, .

Another of Timo’s R map, before he applied k-nearest neighbors on these crowdsourced data. [original]
If you want to know more, please read the original blog or follow Timo’s new DataCamp course called Communicating with Data in the Tidyverse.

Geographical Maps in ggplot2: Rectangle World Map

Geographical Maps in ggplot2: Rectangle World Map

Maarten Lambrechts posted a tutorial where he demonstrates the steps through which he created a Eurovision Song Festival map in R.

Maarten’s ggplot2 Songfestival map

Maarten’s ggplot2 worldmap

Inspired by his tutorial, I decided to create a worldmap of my own, the R code for which you may find below.

options(stringsAsFactors = F) # options
library(tidyverse# packages
# retrieve data file
link = "https://gist.githubusercontent.com/maartenzam/787498bbc07ae06b637447dbd430ea0a/raw/9a9dafafb44d8990f85243a9c7ca349acd3a0d07/worldtilegrid.csv"

geodata <- read.csv(link) %>% as.tibble() # load in geodata
str(geodata) # examine geodata
## Classes 'tbl_df', 'tbl' and 'data.frame':    192 obs. of  11 variables:
##  $ name           : chr  "Afghanistan" "Albania" "Algeria" "Angola" ...
##  $ alpha.2        : chr  "AF" "AL" "DZ" "AO" ...
##  $ alpha.3        : chr  "AFG" "ALB" "DZA" "AGO" ...
##  $ country.code   : int  4 8 12 24 10 28 32 51 36 40 ...
##  $ iso_3166.2     : chr  "ISO 3166-2:AF" "ISO 3166-2:AL" "ISO 3166-2:DZ" "ISO 3166-2:AO" ...
##  $ region         : chr  "Asia" "Europe" "Africa" "Africa" ...
##  $ sub.region     : chr  "Southern Asia" "Southern Europe" "Northern Africa" "Middle Africa" ...
##  $ region.code    : int  142 150 2 2 NA 19 19 142 9 150 ...
##  $ sub.region.code: int  34 39 15 17 NA 29 5 145 53 155 ...
##  $ x              : int  22 15 13 13 15 7 6 20 24 15 ...
##  $ y              : int  8 9 11 17 23 4 14 6 19 6 ...
# create worldmap
worldmap <- ggplot(geodata)

# add rectangle grid + labels
worldmap + 
  geom_rect(aes(xmin = x, ymin = y, 
                xmax = x + 1, ymax = y + 1)) +
  geom_text(aes(x = x, y = y, 
                label = alpha.3))

download (13)

# improve geoms
worldmap + 
  geom_rect(aes(xmin = x, ymin = y, 
                xmax = x + 1, ymax = y + 1,
                fill = region)) +
  geom_text(aes(x = x, y = y, 
                label = alpha.3),
            size = 2, 
            nudge_x = 0.5, nudge_y = -0.5,
            vjust = 0.5, hjust = 0.5) +
  scale_y_reverse() 

download (12)

# finalize plot look
colors = c('yellow', 'red', 'white', 'pink', 'green', 'orange')
worldmap + 
  geom_rect(aes(xmin = x, ymin = y, 
                xmax = x + 1, ymax = y + 1,
                fill = region)) +
  geom_text(aes(x = x, y = y, 
                label = alpha.3),
            size = 3,
            nudge_x = 0.5, nudge_y = -0.5,
            vjust = 0.5, hjust = 0.5) +
  scale_y_reverse() +
  scale_fill_manual(values = colors) +
  guides(fill = guide_legend(ncol = 2), col = F) +
  theme(plot.background = element_rect(fill = "blue"),
        panel.grid = element_blank(),
        panel.background = element_blank(),
        legend.background = element_blank(),
        legend.position = c(0, 0),
        legend.justification = c(0, 0),
        legend.title = element_text(colour = "white"),
        legend.text = element_text(colour = "white"),
        legend.key = element_blank(),
        legend.key.size = unit(0.06, "npc"),
        axis.text = element_blank(), 
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        text = element_text(colour = "white", size = 16)
        ) +
  labs(title = "ggplot2: Worldmap",
       fill = "Region", 
       caption = "paulvanderlaken.com")

ggplo2 Rectangle Worldmap

What would you add to your worldmap? If you end up making one, please send me a copy on paulvanderlaken@gmail.com!