Friday, September 22, 2017

Pearson Correlation vs. Spearman Correlation

A correlation coefficient measures the extent to which two variables tend to change together. Two correlation coefficients are commonly used in statistical analysis: the Peason correlation and the Spearman correlation.

The Pearson correlation coefficient, also referred to as the Pearson's r, Pearson product-moment correlation coefficient (PPMCC) or bivariate correlation, is a measure of the linear correlation between two variables X and Y.

The Spearman's rank correlation coefficient is a nonparametric measure of rank correlation (statistical dependence between the ranking of two variables). It assesses how well the relationship between two variables can be described using a monotonic function.

The Pearson correlation measures linear relationships, while the Spearman correlation measures monotonic relationships. Compared to the Pearson correlation, the Spearman correlation ignores the actual values of the data but only use the rank order of the values. The Spearman correlation between two variables is therefore equal to the Pearson correlation between the rank values of those two variables.

Let's look at some examples to understand the relationship between the Pearson correlation and the Spearman correlation.

When a linear relationship exists between two variables, the absolute values of both correlation coefficients are equal to 1.





When a monotonic but nonlinear relationship exists between two variables, the absolute value of Spearman correlation is equal to 1 while the value of Pearson correlation is less than 1.












When the relationship between the two variables are not monotonic, the values of the two correlations are close to 0.



Note that the above figures also report the corresponding two-sided p-values of the correlations, which measures the probability for uncorrelated data exhibiting the same correlation. With a small p-value, we are more confident that the calculated correlation exists between the two variables. Note that the p-values of the Pearson correlation are only valid when the variables are normally distributed or the dataset is large enough so that the central limit theorem applies. The p-values of the Spearman correlation do not suffer from such an issue because as a nonparametric text, the Spearman correlation does not make assumptions about the distributions of the variables.

From the above examples, it seems that the (absolute) values of the Spearman correlation are always greater than the Pearson correlation. This is intuitive: because the Spearman correlation captures any monotonic relationships while the Pearson correlation only captures linear relationships, in the presence of monotonic relationships, the Spearman correlation is the same as the Pearson correlation when a linear relationship exists, and strictly greater when a nonlinear monotonic relationship exists. However, this conjecture is not true when the relationship between two variables are not strictly monotonic, as shown by some examples from Ascombe's quartet. In the second and the fourth examples, the Spearman correlation is less than the Pearson correlation.