Covariance and Correlation¶

Covariance¶

The covariance between two rv’s, X and Y, is defined as

$$\operatorname{Cov}(X, Y)=E[(X-E(X))(Y-E(Y))] = E[(X- \mu_x))(Y- \mu_y)]$$

$\begin{split}\operatorname{Cov}(X, Y)=\left\{\begin{array}{c} \sum_{x} \sum_{y}\left(x-\mu_{X}\right)\left(y-\mu_{Y}\right) P(X=x, Y=y) \\ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty}\left(x-\mu_{X}\right)\left(y-\mu_{Y}\right) f(x, y) d x d y \end{array}\right.\end{split}$

The covariance depends on both the set of possible pairs and the probabilities for those pairs.

• If both variables tend to deviate in the same direction (both go above their means or below their means at the same time), then the covariance will be positive.

• If the opposite is true, the covariance will be negative.

• If X and Y are not strongly (linearly) related, the covariance will be near 0.

Computational formula for Covariance¶

$$\operatorname{Cov}(X, Y)=E[XY] -E[X]E[Y]$$

Correlation Coefficient¶

The correlation Coefficient of X and Y , denoted by Cor(X, Y ) Represented by the Greek letter ‘’ρ’’ (rho)

$$Cor(X, Y) = \rho_{X,Y}= \frac{\operatorname{cov}(X,Y)}{\sigma_X \sigma_Y}$$

It represents a “scaled” covariance. The correlation is always between -1 and 1.

Transformations of Distributions¶

Discrete Distributions¶

Suppose that 𝖷 ∼ 𝖻𝗂𝗇(𝗇, 𝗉) What is the distribution of Y = n-X?

$$f(x)=P(X=x)= \binom{n}{x}p^x(1-p)^{n-x} \cdot I_{\{1,2,3, \ldots\}}(x)$$

Just do it:

$$P(Y=y)=P(n-X=y)=P(X=n-y)$$
$$= \binom{n}{n-y}p^x(1-p)^{n-(n-y)} \cdot I_{\{0,1,2,3, \ldots\}}(n-y)$$
$$= \binom{n}{y}p^n-y(1-p)^{y} \cdot I_{\{0,1,2,3, \ldots\}}(y) = 𝖸 ∼ 𝖻𝗂𝗇 (𝗇, 𝟣 − 𝗉)$$

Invertible functions¶

In the most general sense, are functions that “reverse” each other. For example, if f takes a to b, then the inverse, $$f^{-1}$$ must take b to a. a function is invertible only if each input has a unique output. That is, each output is paired with exactly one input. That way, when the mapping is reversed, it will still be a function!

For X discrete or continuous, the cumulative distribution function (cdf) Is denoted by F(x) and is defined by

$$F(X)= P(X < x)$$