---
created_at: '2015-11-03T10:10:23.000Z'
title: Cosines and correlation (2010)
url: http://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/
author: ColinWright
points: 49
story_text: 
comment_text: 
num_comments: 17
story_id: 
story_title: 
story_url: 
parent_id: 
created_at_i: 1446545423
_tags:
- story
- author_ColinWright
- story_10498549
objectID: '10498549'
year: 2010

---
[Source](https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/ "Permalink to Cosines and correlation")

# Cosines and correlation

[ ![John D. Cook][1] ][2]

[Skip to content][3]

* [ABOUT][4]
* [SERVICES][5]
    * [APPLIED MATH][6]
    * [STATISTICS][7]
    * [COMPUTATION][8]
* [WRITING][9]
    * [BLOG][2]
    * [TECHNICAL NOTES][10]
    * [JOURNAL ARTICLES][11]
    * [TWITTER][12]
    * [PRESENTATIONS][13]
    * [NEWSLETTER][14]
* [CLIENTS][15]
    * [ENDORSEMENTS][16]

[(832) 422-8646][17]

[Contact][18]

# Cosines and correlation

Posted on [17 June 2010][19] by [John][20]

## Preview

This post will explain a connection between probability and geometry. Standard deviations for independent random variables add according to the Pythagorean theorem. Standard deviations for correlated random variables add like the law of cosines. This is because correlation is a cosine.

## Independent variables

First, let’s start with two independent random variables _X_ and _Y_. Then the standard deviations of _X_ and _Y_ add like sides of a right triangle.

![diagram][21]

In the diagram above, “sd” stands for standard deviation, the square root of variance. The diagram is correct because the formula

Var(_X_+_Y_) = Var(_X_) + Var(_Y_)

is analogous to the Pythagorean theorem

_c_2 = _a_2 \+ _b_2.

## Dependent variables

Next we drop the assumption of independence. If _X_ and _Y_ are correlated, the variance formula is analogous to the law of cosines.

![diagram][22]

The generalization of the previous variance formula to allow for dependent variables is

Var(_X_+_Y_) = Var(_X_) + Var(_Y_) + 2 Cov(_X_, _Y_).

Here Cov(_X_,_Y_) is the covariance of _X_ and _Y_. The analogous law of cosines is

_c_2 = _a_2 \+ _b_2 – 2 _a b_ cos(θ).

If we let _a_, _b_, and _c_ be the standard deviations of _X_, _Y_, and _X_+_Y_ respectively, then cos(θ) = -ρ where ρ is the correlation between _X_ and _Y_ defined by

ρ(_X_, _Y_) = Cov(_X_, _Y_) / sd(_X_) sd(_Y_).

When θ is π/2 (i.e. 90°) the random variables are independent. When θ is larger, the variables are positively correlated. When θ is smaller, the variables are negatively correlated. Said another way, as θ increases from 0 to π (i.e. 180°), the correlation increases from -1 to 1.

The analogy above is a little awkward, however, because of the minus sign. Let’s rephrase it in terms of the supplementary angle φ = π – θ. Slide the line representing the standard deviation of Y over to the left end of the horizontal line representing the standard deviation of X.

![diagram][23]

Now cos(φ) = ρ = correlation(_X_, _Y_).

When φ is small, the two line segments are pointing in nearly the same direction and the random variables are highly positively correlated. If φ is large, near π, the two line segments are pointing in nearly opposite directions and the random variables are highly negatively correlated.

## Connection explained

Now let’s see the source of the connection between correlation and the law of cosines. Suppose _X_ and _Y_ have mean 0. Think of _X_ and _Y_ as members of an inner product space where the inner product <_X_, _Y_> is E(_XY_). Then

<_X_+_Y_, _X_+_Y_> = < _X_, _X_> \+ < _Y_, _Y_> \+ 2<_X_, _Y_ >.

In an inner product space,

<_X_, _Y_ > = || _X_ || || _Y_ || cos φ

where the norm || _X_ || of a vector is the square root of the vector’s inner product with itself. The above equation _defines_ the angle φ between two vectors. You could justify this definition by seeing that it agrees with ordinary plane geometry in the plane containing the three vectors _X_, _Y_, and _X_+_Y_.

* * *

For daily posts on probability, follow [@ProbFact][24] on Twitter.

![ProbFact twitter icon][25]

Categories : [Math][26] [Statistics][27]

Tags : [Math][28] [Probability and Statistics][29]

Bookmark the [permalink][30]

# Post navigation

Previous Post[Why computers have two zeros: +0 and -0][31]

Next Post[Porting Python to C#][32]

##  19 thoughts on “Cosines and correlation”

1. [Mike Anderson][33]

[ 17 June 2010 at 06:11  ][34]

Nice write-up. I sneak this into my lectures occasionally to perk up the math majors, who don’t always notice the many interesting mathematical objects that appear in statistics. ( The denominator for correlation, sd(X)sd(Y), is the geometric mean of the two variances–what’s THAT all about, guys? )

2. [Maria Droujkova][35]

[ 17 June 2010 at 06:37  ][36]

Nice! It would only take a current events example to make a cool enrichment activity. Thank you!

3. Ger Hobbelt

[ 17 June 2010 at 07:30  ][37]

Thank you for this; showing up right when I needed it! Alas, it would have been extra great if my teachers had pointed out this little bit of intel about 25 years ago, while they got me loathing those ever-resurfacing bloody dice even more. Meanwhile, I’ve shown to be dumb enough not to recognize this ‘correlation’ with lovely goniometrics on my own.

Despite all that I’ve found the increasing need for understanding statistics (as you work on/with statistical classifiers and you feel the need to really ‘get’ those s.o.b.s for only then do you have a chance at reasoning why they fail on you the way they do) and your piece just made a bit of my brain drop a quarter — I’m Dutch; comprehension is so precious around here we are willing to part with a quarter instead of only a penny 😉

4. Pingback: Relación entre la ley de cosenos y correlación de variables « Bitácoras en Estadística

5. Harry Hendon

[ 21 June 2010 at 19:55  ][38]

Hi John, maybe you can help on a related problem, which I think uses the law of cosines as well (but I lost my derivation):

If you know the two correlations of one time series with two other predictor time series, what does this tell you about the possible range of correlation between the two predictor time series. That is, given r(X,Y)=a and r(X,Z)=b, what is the possible range of r(Y,Z)=c in terms of a and b?

Of course some examples are intuitively trivial (eg, if a=b=1, then c=1, and if a=0 but b=1 then c=0). But, consider if a=b=.7 (which are strong correlations), then I think the possible range of c is still enormous (0.<c<1\. ). In this case, I reasoned because X accounts for half the variance of Z and Y accounts for half the variance of Z that X could possibly account for the same half of the variance as Y (ie c=r(X,Y)=1). Or, X could account completely for the other half of the variance that is not accounted for by Y but together X and Y account for all of the variance of Z (ie c=r(X,Y)=0). This understanding has, for instance, implications for (over) interpreting causality based on empirical evidence.

6. [Guillermo Bautista][39]

[ 4 July 2010 at 02:47  ][40]

nice article John. We’re looking for more of this.

7. Pingback: [Mathematics and Multimedia Blog Carnival #1 « Mathematics and Multimedia][41]

8. Peter Urbani

[ 12 October 2010 at 22:15  ][42]

Hi John -nice post. If you google “four moment risk decomposition” “urbani” you will find a spreadsheet and a presentation illustrating a related problem in risk that you may be interested in or able to help with.

In that method we step backwards through the Cornish Fisher Expansion to the normal distribution to arrive at the vector length or penalty function used Value at Risk calculations but allowing to solve for the higher moments of skewness and kurtosis also. In risk these can be thought of as additional positive or negative risk penalty vectors but ones which interact with each other. The difficulty here is that in the multivariate setting the interaction embeds the weighting as well where we would ideally like a weight independent solution.

In essence we are treating the skewness as an additional positive or negative vector length and the kurtosis as a change in angle. We cheat by Woking backwards from the univariate solution but have to recalculate the modified correlation matrix each time the weights change which is very computationally expensive. I wonder if you can think of a more elegant global satisfying solution ?

May not be possible since we are essentially trying to reduce a 4dim problem to a 2dim one.

9. Pingback: [Perpendicular and relatively prime — The Endeavour][43]

10. [Rick Wicklin][44]

[ 18 March 2011 at 06:28  ][45]

People who like this connection between statistics and geometry should read  
_The Geometry of Multivariate Statistics_ by Thomas D. Wickens. It is extremely readable and it shows the connection between statistical concepts and the basic geometry of vector spaces.

11. [Tammy Urban][46]

[ 9 June 2011 at 12:40  ][47]

As the rare individual who loves both Geometry AND Statistics, this is one of my favorite relationships I share with my students. Thanks for the great write up!

@Rick, thanks for sharing… I will definitely have to read that!

12. Mudasiru Sefiu

[ 4 November 2011 at 12:44  ][48]

Nice write up, more power on your elbow.

13. Michael

[ 17 November 2011 at 09:34  ][49]

What is the geometric interpretation of the requirement that X and Y have zero means? It’s clear that / || X || || Y || != ρ if this assumption doesn’t hold.

14. Michael

[ 17 November 2011 at 11:12  ][50]

Oops, it looks like the comment engine mistook my angled brackets for HTML; I was trying to re-write the formula under “inner product space” after solving for cos(theta): X · Y / ||X|| ||Y||

15. [Matthew Handy][51]

[ 21 December 2011 at 03:09  ][52]

I’ve always wondered whether there is a connection between the angle between the lines of regression of y on x and x on y and the correlation between x and y. Is it as simple cos(theta)? Certainly if r=0, the angle is 90 and cos(90)=0. Similarly if r=1, theta=0 and cos(0)=1.

16. [Jaidev Deshpande][53]

[ 26 June 2012 at 14:10  ][54]

Is it possible to use this argument to demonstrate why discrete cosine transforms are suitable for data compression?

17. klo

[ 19 January 2013 at 18:40  ][55]

What a simple analogy! Thanks for this revelation  
Cheers to @Ger Hobbelt, with whom I share exact experience, only that I took the exam 2 year ago, and I hated it and all those examples with dices and colored balls. Sheesh

18. Pingback: Expected value estimates you can take (somewhat) literally - The Polemical MedicThe Polemical Medic

19. sandy

[ 29 May 2015 at 05:19  ][56]

Its 6am here in FL. I have a math certification test later this morning, and I couldn’t sleep b/c the relationship you describe in your post occurred to me while lying in bed; I had to get up and google “cosine and correlation” and up came your post!

Peter Urbani’s comment above asked about a more elegant solution when moving to the multivariate world and recalculating the correlation matrix…he expressed some doubt that the idea could be extended to the multivariate realm: “May not be possible since we are essentially trying to reduce a 4dim problem to a 2dim one.”

Fermat’s Theorem popped into my head when I saw this…got me thinking maybe there is a more elegant proof to this Theorem than what Andrew Wiles was able to produce? Showing why Mr Urbani’s suspicions are correct might be essentially proving Fermat’s Last Theorem?

Anyway, glad I found your post! Do you have any advice for an aspiring Actuary?

### Leave a Reply [Cancel reply][57]

Your email address will not be published. Required fields are marked *

Comment

Notify me of followup comments via e-mail

Name *

Email *

Website

Search for:

![John D. Cook][58]

**John D. Cook, PhD**

  
  
  
  

  
  

# Latest Posts

* [Fibonacci / Binomial coefficient identity][59]
* [Painless project management][60]
* [New animation feature for exponential sums][61]
* [Quantifying normal approximation accuracy][62]
* [Ordinary Potential Polynomials][63]

# Categories

CategoriesSelect CategoryBusinessClinical trialsComputingCreativityGraphicsMachine learningMathMusicPowerShellPythonScienceSoftware developmentStatisticsTypographyUncategorized

| ----- |
| ![Subscribe to blog by email][64] | [Subscribe via email][65] | 
| ![RSS icon][66] | [Subscribe via RSS][67] | 
| ![Newsletter icon][68] | [Monthly newsletter][69] | 

### [John D. Cook][2]

© All rights reserved.

Search for:

[1]: https://www.johndcook.com/blog/wp-content/themes/ThemeAlley.Business.Pro/images/Logo.svg
[2]: https://www.johndcook.com/blog/
[3]: https://www.johndcook.com#content "Skip to content"
[4]: https://www.johndcook.com/blog/top/
[5]: https://www.johndcook.com/blog/services-2/
[6]: https://www.johndcook.com/blog/applied-math/
[7]: https://www.johndcook.com/blog/applied-statistics/
[8]: https://www.johndcook.com/blog/applied-computation/
[9]: https://www.johndcook.com/blog/writing/
[10]: https://www.johndcook.com/blog/notes/
[11]: https://www.johndcook.com/blog/articles/
[12]: https://www.johndcook.com/blog/twitter_page/
[13]: https://www.johndcook.com/blog/presentations/
[14]: https://www.johndcook.com/blog/newsletter/
[15]: https://www.johndcook.com/blog/clients-new/
[16]: https://www.johndcook.com/blog/endorsements/
[17]: tel:8324228646
[18]: https://www.johndcook.com/blog/contact/
[19]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/ "04:43"
[20]: https://www.johndcook.com/blog/author/john/ "View all posts by John"
[21]: https://www.johndcook.com/sdpythag.png
[22]: https://www.johndcook.com/sdlawofcos.png
[23]: https://www.johndcook.com/sdsupplement.png
[24]: https://twitter.com/probfact
[25]: https://www.johndcook.com/twitter2018/probfact_no_padding.svg
[26]: https://www.johndcook.com/blog/category/math/
[27]: https://www.johndcook.com/blog/category/statistics/
[28]: https://www.johndcook.com/blog/tag/math/
[29]: https://www.johndcook.com/blog/tag/probability-and-statistics/
[30]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/ "Permalink to Cosines and correlation"
[31]: https://www.johndcook.com/blog/2010/06/15/why-computers-have-signed-zero/
[32]: https://www.johndcook.com/blog/2010/06/18/porting-python-to-c/
[33]: http://waywardstats.spaces.live.com
[34]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11681
[35]: http://www.naturalmath.com
[36]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11682
[37]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11683
[38]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11686
[39]: http://math4allages.wordpress.com/
[40]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11687
[41]: http://math4allages.wordpress.com/2010/07/12/blog-carnival-1/
[42]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11689
[43]: http://www.johndcook.com/blog/2010/11/16/perpendicular-and-relatively-prime/
[44]: http://blogs.sas.com/iml
[45]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11691
[46]: http://www.anova-learning.com
[47]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11692
[48]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11693
[49]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11694
[50]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11695
[51]: http://www.dotmaths.com
[52]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11696
[53]: http://www.brocabrain.blogspot.com
[54]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11697
[55]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-11698
[56]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#comment-545752
[57]: https://www.johndcook.com/blog/2010/06/17/covariance-and-law-of-cosines/#respond
[58]: https://www.johndcook.com/jdc_20170630.jpg
[59]: https://www.johndcook.com/blog/2018/02/22/fibonacci-binomial-coefficient-identity/ "Permanent link to Fibonacci / Binomial coefficient identity"
[60]: https://www.johndcook.com/blog/2018/02/20/painless-project-management/ "Permanent link to Painless project management"
[61]: https://www.johndcook.com/blog/2018/02/19/new-animation-feature-for-exponential-sums/ "Permanent link to New animation feature for exponential sums"
[62]: https://www.johndcook.com/blog/2018/02/19/quantifying-normal-approximation-accuracy/ "Permanent link to Quantifying normal approximation accuracy"
[63]: https://www.johndcook.com/blog/2018/02/17/ordinary-potential-polynomials/ "Permanent link to Ordinary Potential Polynomials"
[64]: https://www.johndcook.com/contact_email.svg
[65]: https://feedburner.google.com/fb/a/mailverify?uri=TheEndeavour&loc=en_US
[66]: https://www.johndcook.com/contact_rss.svg
[67]: https://www.johndcook.com/blog/feed
[68]: https://www.johndcook.com/newsletter.svg
[69]: https://www.johndcook.com/blog/newsletter