Tag: repository

Google’s Dataset Search: Direct access to 25 million interesting datasets

Google’s Dataset Search: Direct access to 25 million interesting datasets

I used to keep a repository of links to interesting datasets to learn data science. However, that page I can retire, as Google has launched its new service Dataset Search.

The “world wide web” hosts millions of datasets, on nearly any topic you can think of. Google’s Dataset Search has indexed almost 25 million of these datasets, giving you a single entry point to search for datasets online. After a year of testing, Dataset Search is now officially out of beta.

After alpha testing, Dataset Search now includes filter based on the types of dataset that you want (e.g., tables, images, text), on whether the dataset is open source/access. For dataset on geographic area’s, you can see the map. The quality of dataset’s descriptions has improved greatly, and the tool now has a mobile version.

Anyone who publishes data can make their datasets discoverable in Dataset Search by describe the properties of their dataset using a special schema on their own web page.

OriginLab’s Graph Gallery: A blast from the past

OriginLab’s Graph Gallery: A blast from the past

Continuing my recent line of posts on data visualization resources, I found another repository in my inbox: OriginLab’s GraphGallery!

If I’m being honest, I would personally advice you to look at the dataviz project instead, if you haven’t heard of that one yet.

However, OriginLab might win in terms of sentiment. It has this nostalgic look of the ’90s, and apparently people really used it during that time. Nevertheless, despite looking old, the repo seems to be quite extensive, with nearly 400 different types of data visualizations:

Quantity isn’t everything though, as some of the 400 entries are disgustingly horrible:

There’s so much wrong with this graph…

What I do like about this OriginLab repo is that it has an option to sort its contents using a random order. This really facilitates discovery of new pearls:

Thanks to Maarten Lambrechts for sharing this resource on twitter a while back!

Data Visualization Style Guide Repositories

Data Visualization Style Guide Repositories

Amy Cesal put together (1) this great overview of style guides for data visualization practice. Moreover, in the original tweet, Amy refers to other great repositories such as (2) this PolicyViz one and (3) this humongous one by Adele.

Amy’s list includes many references to the best practices used by some of the leading data journalism companies, such as the BBC, or professional data companies like Salesforce and IBM.

As I’m worried that this great repository may not stand the test of time on the current Google Docs location, here are the base URLs once more:

URL of guidelines Company name
https://sunlightfoundation.com/2014/03/12/datavizguide Sunlight Foundation
https://cfpb.github.io/design-manual/data-visualization/data-visualization.htmlConsumer Financial Protection Bureau
https://knightcenter.utexas.edu/mooc/file/tdmn_graphics.pdfDallas Morning News
https://urbaninstitute.github.io/graphics-styleguide/The Urban Institute
http://code.minnpost.com/minnpost-styles/MinnPost
https://public.tableau.com/profile/bbc.audiences#!/vizhome/BBCAudiencesTableauStyleGuide/HelloBBC Audiences
https://www.ibm.com/design/v1/language/experience/data-visualization/IBM
https://style.ons.gov.uk/category/data-visualisation/Office for National Statistics
https://www.ibcs.com/standardsInternational Business Communication Standards (IBCS®)
https://data.london.gov.uk/blog/city-intelligence-data-design-guidelines/London City Intelligence
https://www.bbc.co.uk/gel/guidelines/how-to-design-infographicsBBC
https://polaris.shopify.com/design/data-visualizationstShopify
https://ux.opower.com/opattern/how-to-charts.htmlOpower
https://www.consults-iot.comConsults-IoT.Com LLP
https://ux.mailchimp.com/patterns/dataMailChimp
https://material.io/design/communication/data-visualization.htmlGoogle- Material Design
https://lightningdesignsystem.com/guidelines/charts/Salesforce
https://github.com/glosophy/CatoDataVizGuidelines/blob/master/PocketStyleBook.pdfCato Institute
https://bbc.github.io/rcookbook/BBC
https://docs.microsoft.com/en-us/office/dev/add-ins/design/data-visualization-guidelinesMicrosoft
https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/data-visualization-for-human-perceptionACI

If you have any resources or style guides to contribute to Amy’s list, you can do so via this link.

Google Fonts: 915 free font families

Google Fonts: 915 free font families

Looking for a custom typeface to use in your data visualizations? Google Fonts is an awesome databank of nearly a thousands font families you can access, download, and use for free.

If you’re into design, the website includes a blog featuring articles on font design.

Google Fonts among others provided the font for my dissertation cover so I definitely recommend it.

Data Visualization Tools & Resources

There’s this amazing overview of helpful dataviz resources atwww.visualisingdata.com/resources!

Browse through hundreds of helpful data visualization tools, programs, and services. All neatly organized by Andy Kirk in categories: data handling, applications, programming, web-based, qualitative, mapping, specialist, and colour. What a great repository!

A snapshot of www.visualisingdata.com/resource

Looking for expert books on data visualization?
Have a look at these recommendations!

Transitioning from Excel to R: Dictionary of common functions

Transitioning from Excel to R: Dictionary of common functions

Alyssa Columbus published maintains this GitHub repository with a great tutorial on how to move from Excel to R. Very useful for beginning useRs, the repository’s tutorial includes a translation table between Excel and R functions:

Excel Formula R Function Type
ABS abs Arithmetic
ADDRESS assign Essentials
AND &,&&,all Boolean
AVERAGE, AVG, AVERAGEIF mean Arithmetic
BETADIST pbeta Statistics
BETAINV qbeta Statistics
BINOMDIST pbinom when cumulative,dbinom when not Statistics
CEILING ceiling Arithmetic
CELL str has the same idea Essentials
CHIDIST, CHISQDIST pchisq Statistics
CHIINV, CHISQINV qchisq Statistics
CHITEST chisq.test Statistics
CHOOSE switch Essentials
CLEAN gsub Text
COLS, COLUMNS ncol Essentials
COLUMN col,:,seq Essentials
COMBIN choose Essentals
CONCATENATE paste Text
CONFIDENCE -qnorm(alpha/2)*std/sqrt(n) Statistics
CORREL cor Statistics
COUNT, COUNTIF length Arithmetic
COVAR cov Statistics
CRITBINOM qbinom Statistics
DELTA identical Boolean
EXACT == Boolean
EXP exp Arithmetic
EXPONDIST pexp when cumulative,dexp when not Statistics
FACT factorial Arithmetic
FACTDOUBLE dfactorial in the phangorn package Arithmetic
FDIST pf Statistics
FIND regexpr,grepl,grep Text
FINV qf Statistics
FISHER atanh Arithmetic
FISHERINV tanh Arithmetic
FIXED format,sprintf,formatC Essentials
FLOOR floor Arithmetic
FORECAST predict on an lm object Statistics
FREQUENCY cut,table Arithmetic
FTEST var.test Statistics
GAMMADIST pgamma if last argument T,dgamma if last arg. F Statistics
GAMMAINV qgamma Statistics
GAMMALN lgamma Statistics
GAUSS pnorm(x) - 0.5 Statistics
GCD gcd Arithmetic
GEOMEAN exp(mean(log(x))) Arithmetic
GESTEP >= Boolean
HARMEAN harmonic.mean in the psych package Arithmetic
HLOOKUP match,merge Essentials
HYPGEOMDIST dhyper Statistics
IF if,ifelse Essentials
IFERROR try,tryCatch Essentials
INDEX x[y,z] Essentials
INDIRECT get Essentials
INT as.integer(not for negative numbers),floor Arithmetic
INTERCEPT first element of coef of an lm object Statistics
ISLOGICAL is.logical Boolean
ISNA is.na Boolean
ISNUMBER is.numeric Boolean
ISTEXT is.character Boolean
KURT kurtosis in the moments package Statistics
LARGE sort Statistics
LCM scm in the schoolmath package Arithmetic
LEFT substr Text
LEN, LENGTH nchar Text
LINEST lm Statistics
LN, LOG log Arithmetic
LOG10 log10 Arithmetic
LOGINV qlnorm Statistics
LOGNORMDIST plnorm Statistics
LOWER tolower Text
MATCH match,which Essentials
MAX max (sometimes pmax) Arithmetic
MDETERM det Arithmetic
MEDIAN median Arithmetic
MID substr Text
MIN min (sometimes pmin) Arithmetic
MINVERSE solve Arithmetic
MMULT %*% Arithmetic
MOD %% Arithmetic
MODE as.numeric(names(which.max(table(x)))) Arithmetic
MUNIT diag Arithmetic
N as.numeric Arithmetic
NEGBINOMDIST dnbinom Statistics
NORMDIST, NORMSDIST pnorm when cumulative,dnorm when not Statistics
NORMINV, NORMSINV qnorm Statistics
NOT ! Boolean
NOW date,Sys.time Essentials
OR ` ,
PEARSON cor Statistics
PERCENTILE quantile Statistics
PERCENTRANK ecdf Statistics
PERMUT function(n,k) {choose(n,k)*factorial(k)} Arithmetic
PERMUTATIONA n^k Arithmetic
PHI dnorm Statistics
POISSON ppois when cumulatic,dpois when not Statistics
POWER ^ Arithmetic
PROB ecdf Statistics
PRODUCT prod Arithmetic
PROPER toupper Text
QUARTILE quantile Arithmetic
QUOTIENT %/% Arithmetic
RAND runif Arithmetic
RANDBETWEEN sample Arithmetic
RANK rank Essentials
REPLACE sub,gsub Text
REPT rep and paste or paste0 Text
RIGHT substring Text
ROUND round Arithmetic
ROUNDDOWN floor Arithmetic
ROUNDUP ceiling Arithmetic
ROW row,:,seq Essentials
ROWS nrow Essentials
RSQ summary of lm object Statistics
SEARCH regexpr,grep Text
SIGN sign Arithmetic
SKEW skewness in the moments package Statistics
SLOPE in coef of lm object Statistics
SMALL sort Arithmetic
SQRT sqrt Arithmetic
STANDARDIZE scale Statitics
STD, STDEV sd Arithmetic
STEYX predict on an lm object Statistics
STRING format,sprintf,formatC Text
SUBSTITUTE sub,gsub,paste Essentials
SUM, SUMIF sum Arithmetic
SUMPRODUCT crossprod Arithmetic
TDIST pt Statistics
TEXT format,sprintf,formatC Text
TINV abs(qt(x/2,data)) Statistics
TODAY Sys.Date Essentials
TRANSPOSE t Arithmetic
TREND fitted of an lm object Statistics
TRIM sub Essentials
TRIMMEAN mean(x,trim=tr/2) Arithmetic
TRUNC trunc Essentials
TTEST t.test Statistics
TYPE typeof,mode,class Essentials
UPPER toupper Text
VALUE as.numeric Arithmetic
VAR var Essentials
VLOOKUP match,merge Essentials
WEEKDAY weekdays Essentials
WEIBULL pweibull when cumulative,dweibull when not Statistics
ZTEST pnorm Statistics