Category: programming

How most statistical tests are linear models

How most statistical tests are linear models

Jonas Kristoffer Lindeløv wrote a great visual explanation of how the most common statistical tests (t-test, ANOVA, ANCOVA, etc) are all linear models in the back-end.

Jonas’ original blog uses R programming to visually show how the tests work, what the linear models look like, and how different approaches result in the same statistics.

George Ho later remade a Python programming version of the same visual explanation.

If I was thought statistics and methodology this way, I sure would have struggled less! Have a look yourself: https://lindeloev.github.io/tests-as-linear/

The 10 Fundamental Concepts of JavaScript

The 10 Fundamental Concepts of JavaScript

Another pearl of a resource on Twitter is this thread by Madison on 10 of fundamentalal concepts of Javascript — and programming in general for that matter.

For your convience, I copied the links below. Just click them to browse to the resource and learn more about the concept

Click to learn more about each concept

  1. Variables & Scoping
  2. Data types
  3. Objects, Funtions & Arrays
  4. Document Object Model (DOM)
  5. Prototypes & this.
  6. Events
  7. Flow Control (specifically, for-loops)
  8. Security & (web) Accesibility
  9. Good coding practices (to which I’ve linked before)
  10. Async

This 10-step list was compiled as apart of this interesting podcast, which I recommend you listen to as well.

Want to learn more?

According to many, this is the best book to continue learning more about JavaScript.

There’s a (now classic) conference talk that comes with this book, which I can also recommend you watch:

Create a publication-ready correlation matrix, with significance levels, in R

Create a publication-ready correlation matrix, with significance levels, in R

TLDR; You can use the corrtable package (see CRAN or Github)!

In most (observational) research papers you read, you will probably run into a correlation matrix. Often it looks something like this:

FACTOR ANALYSIS

In Social Sciences, like Psychology, researchers like to denote the statistical significance levels of the correlation coefficients, often using asterisks (i.e., *). Then the table will look more like this:

Table 4 from Family moderators of relation between community ...

Regardless of my personal preferences and opinions, I had to make many of these tables for the scientific (non-)publications of my Ph.D..

I remember that, when I first started using R, I found it quite difficult to generate these correlation matrices automatically.

Yes, there is the cor function, but it does not include significance levels.

Then there the (in)famous Hmisc package, with its rcorr function. But this tool provides a whole new range of issues.

What’s this storage.mode, and what are we trying to coerce again?

Soon you figure out that Hmisc::rcorr only takes in matrices (thus with only numeric values). Hurray, now you can run a correlation analysis on your dataframe, you think…

Yet, the output is all but publication-ready!

You wanted one correlation matrix, but now you have two… Double the trouble?

[UPDATED] To spare future scholars the struggle of the early day R programming, Laura Lambert and I created an R package corrtable, which includes the helpful function correlation_matrix.

This correlation_matrix takes in a dataframe, selects only the numeric (and boolean/logical) columns, calculates the correlation coefficients and p-values, and outputs a fully formatted publication-ready correlation matrix!

You can specify many formatting options in correlation_matrix.

For instance, you can use only 2 decimals. You can focus on the lower triangle (as the lower and upper triangle values are identical). And you can drop the diagonal values:

Or maybe you are interested in a different type of correlation coefficients, and not so much in significance levels:

For other formatting options, do have a look at the source code on github.

Now, to make matters even easier, the package includes a second function (save_correlation_matrix) to directly save any created correlation matrices:

Once you open your new correlation matrix file in Excel, it is immediately ready to be copy-pasted into Word!

If you are looking for ways to visualize your correlations do have a look at the packages corrr, corrplot, or ppsr.

I hope this package is of help to you!

Do reach out if you get to use them in any of your research papers!

Sign up to keep up to date on the latest R, Data Science & Tech content:

Best Tech & Programming Talks Ever

Best Tech & Programming Talks Ever

Every now and then, Twitter will offer these golden resources.

Ashley Willis recently asked people to name the best tech talk they’ve ever seen and the results are a resource I don’t want to lose.

Hundreds of people responded, sharing their contenders for the title.

Below, I selected some of the top-rated talks and clustered them accordingly. Click a category to jump to the section.


Big Idea & Programming Meta-Talks

The Future of Programming

Growing a Language

The Mess We’re In

Making Users Awesome

Ethical Dilemmas in Software Engineering


Testing code

Adding Eyes to Your Test Automation Framework

TATFT – Test All The F*cking Time


Language-Specific talks

Concurrency (Python)

How we program multicores (erlang)

Y Not- Adventures in Functional Programming (Ruby)

JavaScript: The Good Parts


Code Design

Core Design Principles for Software Developers

Design Patterns vs Anti pattern in APL


Containers & Kubernetes

The Container Operator’s Manual

Write a Container in Go From Scratch

Container Hacks and Fun Images

Kubernetes and the Path to Serverless

Let’s Build Kubernetes, With a Spreadsheet and Volunteers

Cover image via: https://toggl.com/blog/best-tech-websites

Learn to style HTML using CSS — Tutorials by Mozilla

Learn to style HTML using CSS — Tutorials by Mozilla

Cascading Stylesheets — or CSS — is the first technology you should start learning after HTML. While HTML is used to define the structure and semantics of your content, CSS is used to style it and lay it out. For example, you can use CSS to alter the font, color, size, and spacing of your content, split it into multiple columns, or add animations and other decorative features.

https://developer.mozilla.org/en-US/docs/Learn/CSS

I was personally encoutered CSS in multiple stages of my Data Science career:

  • When I started using (R) markdown (see here, or here), I could present my data science projects as HTML pages, styled through CSS.
  • When I got more acustomed to building web applications (e.g., Shiny) on top of my data science models, I had to use CSS to build more beautiful dashboard layouts.
  • When I was scraping data from Ebay, Amazon, WordPress, and Goodreads, my prior experiences with CSS & HTML helped greatly to identify and interpret the elements when you look under the hood of a webpage (try pressing CTRL + SHIFT + C).

I know others agree with me when I say that the small investment in learning the basics behind HTML & CSS pay off big time:

I read that Mozilla offers some great tutorials for those interested in learning more about “the web”, so here are some quicklinks to their free tutorials:

Screenshot via developer.mozilla.org/en-US/docs/Learn/CSS/CSS_layout/Introduction
100 Python pandas tips and tricks

100 Python pandas tips and tricks

Working with Python’s pandas library often?

This resource will be worth its length in gold!

Kevin Markham shares his tips and tricks for the most common data handling tasks on twitter. He compiled the top 100 in this one amazing overview page. Find the hyperlinks to specific sections below!

Quicklinks to categories

Kevin even made a video demonstrating his 25 most useful tricks: