The aim of this R tutorial is to show you how to compute and visualize a correlation matrix in R . We provide also an online software for computing and visualizing a correlation matrix.

There are different methods for correlation analysis : Pearson parametric correlation test , Spearman and Kendall rank-based correlation analysis . These methods are discussed in the next sections.

Previously, we described how to perform correlation test between two variables . In this article, you’ll learn how to compute a, which is used to investigate the dependence between multiple variables at the same time. The result is a table containing thebetween each variable and the others.

Compute correlation matrix in R

R functions As you may know, The R function cor() can be used to compute a correlation matrix. A simplified format of the function is : cor(x, method = c("pearson", "kendall", "spearman"))

x : numeric matrix or a data frame.

: numeric matrix or a data frame. method: indicates the correlation coefficient to be computed. The default is pearson correlation coefficient which measures the linear dependence between two variables. kendall and spearman correlation methods are non-parametric rank-based correlation test.

If your data contain missing values, use the following R code to handle missing values by case-wise deletion. cor(x, method = "pearson", use = "complete.obs")

Import your data into R Prepare your data as specified here: Best practices for preparing your data set for R Save your data in an external .txt tab or .csv files Import your data into R as follow: # If .txt tab file, use this my_data Here, we’ll use a data derived from the built-in R data set mtcars as an example: # Load data data("mtcars") my_data mpg disp hp drat wt qsec Mazda RX4 21.0 160 110 3.90 2.620 16.46 Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02 Datsun 710 22.8 108 93 3.85 2.320 18.61 Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44 Hornet Sportabout 18.7 360 175 3.15 3.440 17.02 Valiant 18.1 225 105 2.76 3.460 20.22

Compute correlation matrix res mpg disp hp drat wt qsec mpg 1.00 -0.85 -0.78 0.68 -0.87 0.42 disp -0.85 1.00 0.79 -0.71 0.89 -0.43 hp -0.78 0.79 1.00 -0.45 0.66 -0.71 drat 0.68 -0.71 -0.45 1.00 -0.71 0.09 wt -0.87 0.89 0.66 -0.71 1.00 -0.17 qsec 0.42 -0.43 -0.71 0.09 -0.17 1.00 In the table above correlations coefficients between the possible pairs of variables are shown. Note that, if your data contain missing values, use the following R code to handle missing values by case-wise deletion. cor(my_data, use = "complete.obs") Unfortunately, the function cor() returns only the correlation coefficients between variables. In the next section, we will use Hmisc R package to calculate the correlation p-values.

Correlation matrix with significance levels (p-value) The function rcorr() [in Hmisc package] can be used to compute the significance levels for pearson and spearman correlations. It returns both the correlation coefficients and the p-value of the correlation for all possible pairs of columns in the data table. Simplified format: rcorr(x, type = c("pearson","spearman")) x should be a matrix. The correlation type can be either pearson or spearman. Install Hmisc package: install.packages("Hmisc") Use rcorr() function library("Hmisc") res2 mpg disp hp drat wt qsec mpg 1.00 -0.85 -0.78 0.68 -0.87 0.42 disp -0.85 1.00 0.79 -0.71 0.89 -0.43 hp -0.78 0.79 1.00 -0.45 0.66 -0.71 drat 0.68 -0.71 -0.45 1.00 -0.71 0.09 wt -0.87 0.89 0.66 -0.71 1.00 -0.17 qsec 0.42 -0.43 -0.71 0.09 -0.17 1.00 n= 32 P mpg disp hp drat wt qsec mpg 0.0000 0.0000 0.0000 0.0000 0.0171 disp 0.0000 0.0000 0.0000 0.0000 0.0131 hp 0.0000 0.0000 0.0100 0.0000 0.0000 drat 0.0000 0.0000 0.0100 0.0000 0.6196 wt 0.0000 0.0000 0.0000 0.0000 0.3389 qsec 0.0171 0.0131 0.0000 0.6196 0.3389 The output of the function rcorr() is a list containing the following elements : - r : the correlation matrix - n : the matrix of the number of observations used in analyzing each pair of variables - P : the p-values corresponding to the significance levels of correlations. If you want to extract the p-values or the correlation coefficients from the output, use this: # Extract the correlation coefficients res2$r # Extract p-values res2$P

A simple function to format the correlation matrix This section provides a simple function for formatting a correlation matrix into a table with 4 columns containing : Column 1 : row names (variable 1 for the correlation test)

Column 2 : column names (variable 2 for the correlation test)

Column 3 : the correlation coefficients

Column 4 : the p-values of the correlations The custom function below can be used : # ++++++++++++++++++++++++++++ # flattenCorrMatrix # ++++++++++++++++++++++++++++ # cormat : matrix of the correlation coefficients # pmat : matrix of the correlation p-values flattenCorrMatrix Example of usage : library(Hmisc) res2 row column cor p 1 mpg cyl -0.85216194 6.112697e-10 2 mpg disp -0.84755135 9.380354e-10 3 cyl disp 0.90203285 1.803002e-12 4 mpg hp -0.77616835 1.787838e-07 5 cyl hp 0.83244747 3.477856e-09 6 disp hp 0.79094857 7.142686e-08 7 mpg drat 0.68117189 1.776241e-05 8 cyl drat -0.69993812 8.244635e-06 9 disp drat -0.71021390 5.282028e-06 10 hp drat -0.44875914 9.988768e-03 11 mpg wt -0.86765939 1.293956e-10 12 cyl wt 0.78249580 1.217567e-07 13 disp wt 0.88797992 1.222311e-11 14 hp wt 0.65874785 4.145833e-05 15 drat wt -0.71244061 4.784268e-06 16 mpg qsec 0.41868404 1.708199e-02 17 cyl qsec -0.59124213 3.660527e-04 18 disp qsec -0.43369791 1.314403e-02 19 hp qsec -0.70822340 5.766250e-06 20 drat qsec 0.09120482 6.195823e-01 21 wt qsec -0.17471591 3.388682e-01