Create a publication-ready correlation matrix, with significance levels, in R

Create a publication-ready correlation matrix, with significance levels, in R

In most (observational) research papers you read, you will probably run into a correlation matrix. Often it looks something like this:

FACTOR ANALYSIS

In Social Sciences, like Psychology, researchers like to denote the statistical significance levels of the correlation coefficients, often using asterisks (i.e., *). Then the table will look more like this:

Table 4 from Family moderators of relation between community ...

Regardless of my personal preferences and opinions, I had to make many of these tables for the scientific (non-)publications of my Ph.D..

I remember that, when I first started using R, I found it quite difficult to generate these correlation matrices automatically.

Yes, there is the cor function, but it does not include significance levels.

Then there the (in)famous Hmisc package, with its rcorr function. But this tool provides a whole new range of issues.

What’s this storage.mode, and what are we trying to coerce again?

Soon you figure out that Hmisc::rcorr only takes in matrices (thus with only numeric values). Hurray, now you can run a correlation analysis on your dataframe, you think…

Yet, the output is all but publication-ready!

You wanted one correlation matrix, but now you have two… Double the trouble?

To spare future scholars the struggle of the early day R programming, I would like to share my custom function correlation_matrix.

My correlation_matrix takes in a dataframe, selects only the numeric (and boolean/logical) columns, calculates the correlation coefficients and p-values, and outputs a fully formatted publication-ready correlation matrix!

You can specify many formatting options in correlation_matrix.

For instance, you can use only 2 decimals. You can focus on the lower triangle (as the lower and upper triangle values are identical). And you can drop the diagonal values:

Or maybe you are interested in a different type of correlation coefficients, and not so much in significance levels:

For other formatting options, do have a look at the source code below.

Now, to make matters even more easy, I wrote a second function (save_correlation_matrix) to directly save any created correlation matrices:

Once you open your new correlation matrix file in Excel, it is immediately ready to be copy-pasted into Word!

If you are looking for ways to visualize your correlations do have a look at the packages corrr and corrplot.

I hope my functions are of help to you!

Do reach out if you get to use them in any of your research papers!

I would be super interested and feel honored.

correlation_matrix

#' correlation_matrix
#' Creates a publication-ready / formatted correlation matrix, using `Hmisc::rcorr` in the backend.
#'
#' @param df dataframe; containing numeric and/or logical columns to calculate correlations for
#' @param type character; specifies the type of correlations to compute; gets passed to `Hmisc::rcorr`; options are `"pearson"` or `"spearman"`; defaults to `"pearson"`
#' @param digits integer/double; number of decimals to show in the correlation matrix; gets passed to `formatC`; defaults to `3`
#' @param decimal.mark character; which decimal.mark to use; gets passed to `formatC`; defaults to `.`
#' @param use character; which part of the correlation matrix to display; options are `"all"`, `"upper"`, `"lower"`; defaults to `"all"`
#' @param show_significance boolean; whether to add `*` to represent the significance levels for the correlations; defaults to `TRUE`
#' @param replace_diagonal boolean; whether to replace the correlations on the diagonal; defaults to `FALSE`
#' @param replacement character; what to replace the diagonal and/or upper/lower triangles with; defaults to `""` (empty string)
#'
#' @return a correlation matrix
#' @export
#'
#' @examples
#' `correlation_matrix(iris)`
#' `correlation_matrix(mtcars)`
correlation_matrix <- function(df, 
                               type = "pearson",
                               digits = 3, 
                               decimal.mark = ".",
                               use = "all", 
                               show_significance = TRUE, 
                               replace_diagonal = FALSE, 
                               replacement = ""){
  
  # check arguments
  stopifnot({
    is.numeric(digits)
    digits >= 0
    use %in% c("all", "upper", "lower")
    is.logical(replace_diagonal)
    is.logical(show_significance)
    is.character(replacement)
  })
  # we need the Hmisc package for this
  require(Hmisc)
  
  # retain only numeric and boolean columns
  isNumericOrBoolean = vapply(df, function(x) is.numeric(x) | is.logical(x), logical(1))
  if (sum(!isNumericOrBoolean) > 0) {
    cat('Dropping non-numeric/-boolean column(s):', paste(names(isNumericOrBoolean)[!isNumericOrBoolean], collapse = ', '), '\n\n')
  }
  df = df[isNumericOrBoolean]
  
  # transform input data frame to matrix
  x <- as.matrix(df)
  
  # run correlation analysis using Hmisc package
  correlation_matrix <- Hmisc::rcorr(x, type = type)
  R <- correlation_matrix$r # Matrix of correlation coeficients
  p <- correlation_matrix$P # Matrix of p-value 
  
  # transform correlations to specific character format
  Rformatted = formatC(R, format = 'f', digits = digits, decimal.mark = decimal.mark)
  
  # if there are any negative numbers, we want to put a space before the positives to align all
  if (sum(!is.na(R) & R < 0) > 0) {
    Rformatted = ifelse(!is.na(R) & R > 0, paste0(" ", Rformatted), Rformatted)
  }

  # add significance levels if desired
  if (show_significance) {
    # define notions for significance levels; spacing is important.
    stars <- ifelse(is.na(p), "", ifelse(p < .001, "***", ifelse(p < .01, "**", ifelse(p < .05, "*", ""))))
    Rformatted = paste0(Rformatted, stars)
  }
  
  # make all character strings equally long
  max_length = max(nchar(Rformatted))
  Rformatted = vapply(Rformatted, function(x) {
    current_length = nchar(x)
    difference = max_length - current_length
    return(paste0(x, paste(rep(" ", difference), collapse = ''), sep = ''))
  }, FUN.VALUE = character(1))
  
  # build a new matrix that includes the formatted correlations and their significance stars
  Rnew <- matrix(Rformatted, ncol = ncol(x))
  rownames(Rnew) <- colnames(Rnew) <- colnames(x)
  
  # replace undesired values
  if (use == 'upper') {
    Rnew[lower.tri(Rnew, diag = replace_diagonal)] <- replacement
  } else if (use == 'lower') {
    Rnew[upper.tri(Rnew, diag = replace_diagonal)] <- replacement
  } else if (replace_diagonal) {
    diag(Rnew) <- replacement
  }
  
  return(Rnew)
}

save_correlation_matrix

#' save_correlation_matrix
#' Creates and save to file a fully formatted correlation matrix, using `correlation_matrix` and `Hmisc::rcorr` in the backend
#' @param df dataframe; passed to `correlation_matrix`
#' @param filename either a character string naming a file or a connection open for writing. "" indicates output to the console; passed to `write.csv`
#' @param ... any other arguments passed to `correlation_matrix`
#'
#' @return NULL
#'
#' @examples
#' `save_correlation_matrix(df = iris, filename = 'iris-correlation-matrix.csv')`
#' `save_correlation_matrix(df = mtcars, filename = 'mtcars-correlation-matrix.csv', digits = 3, use = 'lower')`
save_correlation_matrix = function(df, filename, ...) {
  return(write.csv2(correlation_matrix(df, ...), file = filename))
}

Sign up to keep up to date on the latest R, Data Science & Tech content:

47 thoughts on “Create a publication-ready correlation matrix, with significance levels, in R

  1. Great, thank you! One question, can you use this package to calculate the correlation matrix split by groups? I have some experimental results with different treatments, and I am interested in the correlation matrix by treatment.
    Thanks!
    Gerardo

    Liked by 1 person

    1. Hi Gerardo. I am not currently in reach of a computer. Yet I think something like `lapply(split(df, group_var), correlation_matrix)` should work.
      Split first splits your df into a list with seperate dfs per group on the group_var, and then lapply applies the correlation_matrix function to each list element (split df), returning seperate correlation matrices in a list. Have a look at the split and lapply base function documentation for how they precisely work.

      Like

  2. Is there a problem with columns containing only one level or value such as `correlation_matrix(dplyr::mutate(mtcars, aa=0) %>% as.data.frame())`?

    Like

    1. Hi Jimbou! I had accounted for error-handling in case of missing correlations (when there is no variation in one of the variables).

      I have changed the code and the function can now handle such cases. The respective correlation matrix column/row will contain NaNs. Moreover, I’ve improved the function slightly to make all correlation value strings equally long.

      I should really stop posting code on my website and open github repositories with change requests ; )

      Thanks for noticing this error though!

      Like

    1. I copied the provided code for the function into an r markdown sheet and used to following code to produce the correlation matrix:

      correlation_matrix(mydata.BM.morning.sleep, show_significance = TRUE, digits = 2, use = “lower”, replace_diagonal = TRUE, replacement = “”)

      Like

      1. I still can’t deduce the issue here. What happens if you try using the function without providing all the arguments? If you give it just your data? The data is numeric right?

        Like

      2. Yeah, I’ve tried it without any arguments and it’s exactly the same. It looks like these images:



        The “” don’t appear when using save_correlation_matrix though.

        Like

      3. Yeah those “” are supposed to appear, as they indicate that the correlation coefficients are stored as textual (character) values in R. That is necessary as they are a combination of the numerical coefficients and the textual significance indicators (***). Once you export them to Excel, the “” dissappear as Excel does not use them to indicate that data is textual. Does this clarify your issue?
        If you want to create a correlation table in R markdown without the “” you can look into further manipulating the output correlation matrix using the gt package to turn it into a pretty table.

        Like

      4. Ah, yep! That completely makes sense and it’s not an issue as it doesn’t appear in the saved output. Thank you so much for coding this – it’s great!

        Liked by 1 person

  3. Hi Paul, this is so great! Was searching for a easy-to-use method for this issue and this one is fantastic! Thanks a lot for this work and especially for sharing with others, this is how it works!

    Like

  4. Hi, this works amazing, thank you! However, I had to tweak one little thing and let you know about it: to my understanding, the fuction does not feed “type” adequately into rcorr, which made me unable to run spearman correlations. I changed “[…] type = )” to “[…]type = type)” and then it worked. 🙂

    Like

  5. A massive thank you for the function! However the save matrix function doesn’t seem to work properly… it was pasted as one single chunk without being split into separate cols, any ideas how to fix that? Thank you!

    Like

    1. Thanx for the fast response! In my database there is a variety of character or factor variables but the majority is nummeric:

      See a excerpt from a couple of numeric variables(via str(database):

      $ T1_MEA_ADP_AUC : num 95 28 116 24 70 72 25 74 45 71 …
      $ T1_MEA_ADP_Aggr : num 158.3 53.6 222.8 55.9 134.3 …
      $ T1_MEA_ADP_Vel : num 21.3 8 23 5.9 14.1 15.3 7.1 15.5 9.6 17.8 …
      $ T1_MEA_ASPI_AUC : num 18 14 15 8 23 49 34 43 6 29 …
      $ T1_MEA_ASPI_Aggr : num 40.9 34.9 40.5 18.8 77.3 …
      $ T1_MEA_ASPI_Vel : num 5.4 4.3 5.2 4.1 9.4 11.9 9.7 11.4 2.5 6.9 …
      $ T1_MEA_TRAP_AUC : num 119 107 172 66 123 70 55 94 121 17.8 …

      Like

    1. rm(list=ls(all=TRUE))
      library(data.table)
      library(MBESS)
      library(QuantPsyc)
      library(readr)
      library(lattice)
      library(VIM)
      library(tableone)
      library(foreign)
      library(gdata)
      library(lattice)
      library(nlme)
      library(psych)
      library(ggplot2)
      library(car)
      library(effects)
      library(papeR)
      library(JM)
      library(lme4)
      library(haven)
      library(rms)
      library(mice)
      library(data.table)
      library(lattice)
      library(VIM)
      library(tableone)
      library(readxl)

      dput(my.data3)

      my.data <-dput(my.data3)

      #loading correlation_matrixt script

      correlation_matrix(my.data3, use ='lower', type ='spearman', show_significance = FALSE)

      outcome.matrix <- Dropping non-numeric/-boolean column(s): Study_number, Incl_age, Incl_abd_surg, Incl_ascal, Excl_IC, Excl_life_exp, Excl_thromb_ther, Excl_Gibleed, Excl_plat_dis, Excl_surg_trau, Excl_Hb, Informed_consent, Sex, Family_burden_CVD, Smoking, MH_hypertension, MH_Atrium_fibrillation, MH_Cong_HF, MH_Angina_pectoris, MH_Aorta_valve_stenosis, MH_Myocardial_Infarction, MH_MI_PCI, MH_Cardiac_surgery, MH_Cardiacsurgery_specified, MH_pacemaker, MH_dyslipidemia, MH_Diabetes, MH_TIA_CVA, MH_COPD, MH_dialysis, MH_renal_insufficiency, MH_peripheral_vascular_disease, Vascsurgery, MH_PAD_surgery, NYHA_class, Rev_Cardiac_Risk_Index_surg, Rev_Cardiac_Risk_Index_isch_heart, Rev_Cardiac_Risk_Index_CHF, Rev_Cardiac_Risk_Index_CVA, Rev_Cardiac_Risk_Index_DMins, Rev_Cardiac_Risk_Index_creat, Med_aspirin, Dose_aspirin, Med_P2Y12, Med_VitK, Med_NOAC, Med_NSAID, Med_dipyridamole, Med_betablocker, Med_CA_ant, Med_ACE_inh, Med_Diuretics, Med_Digoxin, Med_Statin, Med_prednisone, T1_blood_collection_date, T1_blood_collection_time, T1_collection_mechanism, T2_blood_collection_date, T2_blood_collection_time, T2_collection_mechanism, T3_blood_collection_date, T3_blood_collection_time, T3_collection_mechanism, T4_blood_collection_date, T4_blood_collection_time, T4_collection_mechanism, T1_MEA_time, Transcombined, MI_type, FU_Other_intervention_specified, FU_infection_specified, FU_arrhythmia_specified, Outlierdeselect, filter_.

      Age_inclusion Body_length Body_weight BMI_calc Alcohol Packyears MH_Avs_Grad MH_Avs_AVA MH_LVEF
      Age_inclusion " 1.000" "" "" "" "" "" "" "" ""
      Body_length "-0.089" " 1.000" "" "" "" "" "" "" ""
      Body_weight " 0.024" " 0.400" " 1.000" "" "" "" "" "" ""
      Rev_Cardiac_Risk_Index_TOTAL T1_MEA_ADP_AUC T1_MEA_ADP_Aggr T1_MEA_ADP_Vel T1_MEA_ASPI_AUC T1_MEA_ASPI_Aggr
      Age_inclusion "" "" "" "" "" ""
      Body_length "" "" "" "" "" ""
      Body_weight "" "" "" "" "" ""
      T1_MEA_ASPI_Vel T1_MEA_TRAP_AUC T1_MEA_TRAP_Aggr T1_MEA_TRAP_Vel T2_MEA_ADP_AUC T2_MEA_ADP_Aggr T2_MEA_ADP_Vel
      Age_inclusion "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" ""
      T2_MEA_ASPI_AUC T2_MEA_ASPI_Aggr T2_MEA_ASPI_Vel T2_MEA_TRAP_AUC T2_MEA_TRAP_Aggr T2_MEA_TRAP_Vel T3_MEA_ADP_AUC
      Age_inclusion "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" ""
      T3_MEA_ADP_Aggr T3_MEA_ADP_Vel T3_MEA_ASPI_AUC T3_MEA_ASPI_Aggr T3_MEA_ASPI_Vel T3_MEA_TRAP_AUC T3_MEA_TRAP_Aggr
      Age_inclusion "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" ""
      T3_MEA_TRAP_Vel T4_MEA_ADP_AUC T4_MEA_ADP_Aggr T4_MEA_ADP_Vel T4_MEA_ASPI_AUC T4_MEA_ASPI_Aggr T4_MEA_ASPI_Vel
      Age_inclusion "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" ""
      T4_MEA_TRAP_AUC T4_MEA_TRAP_Aggr T4_MEA_TRAP_Vel T1_CK_r T1_CK_k T1_CK_angle T1_CK_MA T1_CK_LY30 T1_CRT_r T1_CRT_k
      Age_inclusion "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" ""
      T1_CRT_angle T1_CRT_MA T1_CRT_LY30 T1_CRT_TEGACT T1_CKH_r T1_CKH_k T1_CKH_angle T1_CKH_MA T1_CFF_MA T1_CFF_FLEV T2_CK_r
      Age_inclusion "" "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" "" ""
      T2_CK_k T2_CK_angle T2_CK_MA T2_CK_LY30 T2_CRT_r T2_CRT_k T2_CRT_angle T2_CRT_MA T2_CRT_LY30 T2_CRT_TEGACT T2_CKH_r
      Age_inclusion "" "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" "" ""
      T2_CKH_k T2_CKH_angle T2_CKH_MA T2_CFF_MA T2_CFF_FLEV T3_CK_r T3_CK_k T3_CK_angle T3_CK_MA T3_CK_LY30 T3_CRT_r T3_CRT_k
      Age_inclusion "" "" "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" "" "" ""
      T3_CRT_angle T3_CRT_MA T3_CRT_LY30 T3_CRT_TEGACT T3_CKH_r T3_CKH_k T3_CKH_angle T3_CKH_MA T3_CFF_MA T3_CFF_FLEV T4_CK_r
      Age_inclusion "" "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" "" ""
      T4_CK_k T4_CK_angle T4_CK_MA T4_CK_LY30 T4_CRT_r T4_CRT_k T4_CRT_angle T4_CRT_MA T4_CRT_LY30 T4_CRT_TEGACT T4_CKH_r
      Age_inclusion "" "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" "" ""
      T4_CKH_k T4_CKH_angle T4_CKH_MA T4_CFF_MA T4_CFF_FLEV T1_VerifyNow_ARU T2_VerifyNow_ARU T3_VerifyNow_ARU T4_VerifyNow_ARU
      Age_inclusion "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" ""
      T1_Psel_TRAP T1_Psel_noTRAP T1_ratio T2_Psel_TRAP T2_Psel_noTRAP T2_ratio T3_Psel_TRAP T3_Psel_noTRAP T3_ratio
      Age_inclusion "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" ""
      T4_Psel_TRAP T4_Psel_noTRAP T4_ratio T1_ECG_findings T2_ECG_findings T3_ECG_findings T4_ECG_findings T1_Hemoglobin
      Age_inclusion "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" ""
      T1_Hematocrit T1_Leukocytecount T1_Monocytecount T1_Plateletcount T1_MPV T1_PDW T1_PT T1_aPTT T1_fibrinogeen
      Age_inclusion "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" ""
      T1_cephotest T1_hsCRP T1_hscTnT T1_Urea T1_creatinine T1_proBNP T2_Hemoglobin T2_Hematocrit T2_Leukocytecount
      Age_inclusion "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" ""
      T2_Monocytecount T2_Plateletcount T2_MPV T2_PDW T2_PT T2_aPTT T2_fibrinogeen T2_cephotest T2_hsCRP T2_hscTnT
      Age_inclusion "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" ""
      T2_Urea T2_creatinine T2_proBNP T3_Hemoglobin T3_Hematocrit T3_Leukocytecount T3_Monocytecount T3_Plateletcount T3_MPV
      Age_inclusion "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" ""
      T3_PDW T3_PT T3_aPTT T3_fibrinogeen T3_cephotest T3_hsCRP T3_hscTnT T3_Urea T3_creatinine T3_proBNP T4_Hemoglobin
      Age_inclusion "" "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" "" ""
      T4_Hematocrit T4_Leukocytecount T4_Monocytecount T4_Plateletcount T4_MPV T4_PDW T4_PT T4_aPTT T4_fibrinogeen
      Age_inclusion "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" ""
      T4_cephotest T4_hsCRP T4_hscTnT T4_Urea T4_creatinine T4_proBNP Surgery_date Surgery_duration Epidural
      Age_inclusion "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" ""
      Peroperative_heparin Transf_ery Transf_trombo Transf_plasma No_hypotensive_episodes Blood_loss Cell_saver FU_Postop_MI
      Age_inclusion "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" ""
      FU_CAG_or_PCI FU_Other_Intervention FU_Stroke FU_renalinsufficiency FU_infection FU_sepsis FU_arrhythmia FU_mortality
      Age_inclusion "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" ""
      Date_discharge dayshospitilization DeltatropT1 DeltatropT2 DeltatropT3 DeltatropT4 DeltatropT2T3 DeltatropT2T4 MINSdelta
      Age_inclusion "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" ""
      Peakdeltatrop MINSabs Deltacreat12 Deltacreat13 Deltacreat14 Smokingbinair VerifyNowcateg GFR_1 GFR_2 GFR_3
      Age_inclusion "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" ""
      GFR_4 GFRdelta11 GFRdelta12 GFRdelta13 GFRdelta14 ARUgreater550 ARUgrT1 ARUgrT2 ARUgrT3 ARUgrT4 tropbaselinehigh
      Age_inclusion "" "" "" "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" "" "" "" ""
      ASPI40T2 ASPI40T3 ASPI40T4 Peak_troponin Peak_Psel Corres_Peak_trombocytes Peak_trombocytes Corres_Peak_Verifynow
      Age_inclusion "" "" "" "" "" "" "" ""
      Body_length "" "" "" "" "" "" "" ""
      Body_weight "" "" "" "" "" "" "" ""
      Peak_Verifynow Corres_Peak_multiplate_ADP Peak_multiplate_ADP Corres_Peak_multiplate_TRAP Peak_Multiplate_TRAP
      Age_inclusion "" "" "" "" ""
      Body_length "" "" "" "" ""
      Body_weight "" "" "" "" ""
      Corres_Peak_multiplate_ASPI Peak_Multiplate_ASPI Corres_Peak_Fibrinogen Peak_Fibrinogen
      Age_inclusion "" "" "" ""
      Body_length "" "" "" ""
      Body_weight "" "" "" ""
      [ reached getOption("max.print") — omitted 261 rows ]
      Warning message:
      In sqrt(npair – 2) : NaNs produced

      #The right variables seem to be dropped this time, but only few nummeric correlations appear as you can see. What might be the issue?

      Liked by 1 person

  6. Dear Paul, thanx for the nice script. I am running into a problem when using your script on the database:
    the script seems to drop almost all the nummeric variables(according to the str(..function) and provides the following errors:

    Warning messages:
    1: In sqrt(npair – 2) : NaNs produced
    2: In pt(abs(h) * sqrt(npair – 2)/sqrt(pmax(1 – h * h, 0)), npair – :
    NaNs produced

    Do i need to prepare my database in a different way? What to do to resolve this issue?

    Thanks in advance,

    Regards

    Like

  7. Though thisis still not a reproducible example, I can guess where it goes wrong. You are calculating spearmans correlation coefficient, which is a coefficient for ordinal variables. Your numeric variables are continuous though. Could this be the problem?

    Like

    1. ah thanx! I see that the max omitted was low thus only 3 variable were shown! How do I best can see the whole matrix? Just just option(max.print=etc? Is there a better way to have an overview of the whole matrix? As still a majority of the nummerical variables are not appearing.
      I do still get the warning signs:

      1: In sqrt(npair – 2) : NaNs produced
      2: In pt(abs(h) * sqrt(npair – 2)/sqrt(pmax(1 – h * h, 0)), npair – :
      NaNs produced

      Like

  8. Hi Paul, I am a newbie R user here and it looks like this is exactly what I need for my research. I just have one question (and it might be a stupid one) but how do I install the package to be able to use the script?

    Like

    1. Hi Gijs, this is exactly a great question, and I see how it was not directly clear.
      I did not “package” this function up unfortunately.
      If you want to use this in your project, I suggest you copy all the code blocks into a new R script, which you can name “correlation.R” or something like that.
      Then, in your actual script, you “source” in these correlation matrix functions using `source(“your_directory_path_here/correlation.R”)`. R then runs the script with these functions in it, and they will appear in your environment for you to use in your analysis.

      Hope this helps!

      Like

  9. Another question. Any idea how I can add the p-values corresponding to each *, **, ***?

    Like

  10. What do you mean? The correlation_matrix function already adds those stars, I believe. You can change what significance levels and what symbols (*) are used in the code there. If you want to add textual representation of what * ** and *** are, I would do so in R markdown, or manually.

    Like

    1. Ah sorry if I wasnt clear. I actually want to know what p-value corresponds to what *. I have *, ** and *** in my table but don’t know which they are. Or is it a standard? so ie: * < 0.1, **<0.05 and ***<0.01?

      Like

  11. Hi, how do I get an output in R without the quotation marks (“)? example below

    a254 “1.000 ” “0.970 ” “-0.055” “0.973 ” “0.432 ” “-0.773” “0.704 ” “0.735 ” “-0.755” “0.738 ” “-0.749”
    E2_E3 “0.970 ” “1.000 ” “-0.226” “0.986 ” “0.332 ” “-0.711” “0.534 ” “0.558 ” “-0.583” “0.771 ” “-0.624”
    E4_E6 “-0.055” “-0.226” “1.000 ” “-0.136” “0.555 ” “-0.261” “0.549 ” “0.515 ” “-0.492” “0.064 ” “-0.352”

    Like

    1. R needs the quotation marks to identify that the data are text strings. The data need to be text strings in order to be able to display the numbers in this specific format (3 decimals + significance signs).

      There is a trick to print data without the quotation marks. You can use the cat() function. But I think you would need to write a for loop of some sort with spacing logic in order to retain the matrix table structure…

      Like

  12. Hi Paul,
    As a new R user this is a really useful function, thank you!
    I have successfully created my correlation table, however, I am encountering an error when trying to use the save_correlation_matrix function.
    I created my correlation table using the following command:

    save_correlation_matrix(dum.cor.table,
    filename = “Dummy data correlation table.csv”,
    digits = 3,
    use = “lower”)

    I then tried to save the table using the following command:

    save_correlation_matrix(df = dum.cor.table,
    filename = “dummy-data-correlation-table.csv”,
    digits = 3,
    use = “lower”)

    And I receive the following error:
    Error in Hmisc::rcorr(x, type = type) : must have >4 observations

    Any help would be greatly appreciated!

    Like

    1. Hi Ben! It seems to me that one of your correlation analyses has less than 4 observations to work with. If your dataset is larger, than this is probably due to missing values in either of the respective colums. This is not something I knew would produce an error. The error does not come from my function, but from the rcorr funnction that belongs to the Hmisc package. I don’t know how to solve this except by removing one of the respective columns. Sorry 😦

      Like

  13. Thanks for the great functions! Can you provide a citation so I can credit your work in a publication? Sorry if I missed this before while looking for one.

    Like

    1. Hi Evan, thanks for your nice words and asking for a reference. I’d use the formats listed here: https://www.easybib.com/guides/citation-guides/how-do-i-cite-a/how-to-cite-a-blog/#:~:text=Author's%20Last%20Name%2C%20Author's%20First,section%20name%20(if%20applicable).

      So something like: van der Laken, P.A. (2020, July 28). Create a publication-ready correlation matrix, with significance levels, in R. paulvanderlaken.com. https://paulvanderlaken.com/2020/07/28/publication-ready-correlation-matrix-significance-r/comment-page-1/#comment-27232

      Like

  14. I really appreciate your function! I would love if you made it available as a package or on a github page so we can call it directly
    Cheers!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s