Glam Prestige Journal

Bright entertainment trends with youth appeal.

$\begingroup$

Let X and Y be Bernoulli random variables. We don't assume independence or identical distribution, but we do assume that all 4 of the following probabilities are nonzero.

Let a := P[X = 1, Y = 1], b := P[X = 1, Y = 0], c := P[X = 0, Y = 1], and d := P[X = 0, Y = 0].

How do I obtain a formula for a correlation between random variables X and Y?

$\endgroup$

4 Answers

$\begingroup$

Stefan Hansen's hint is a good one. Here is the complete derivation: $${\rm E}[X]=a+b=p$$ $${\rm E}[Y]=a+c=q$$ \begin{align} \mathrm{Var}(X) & ={\rm E}[(X - {\rm E}[X])^2] \\ & = {\rm E}[(X - p)^2] \\ &= p(1-p)^2 + (1-p)(-p)^2 \\ & = p (1-2p+p^2) + p^2 - p^3 \\ & = p - 2p^2 + p^3+p^2-p^3 \\ & = p - p^2 \\ & = p(1-p) \end{align} $$\sigma{_X} =\sqrt{\mathrm{Var}(X)} = \sqrt{p(1-p)} = \sqrt{(a+b)(1-(a+b))}$$ $$\sigma{_Y} =\sqrt{\mathrm{Var}(Y)} = \sqrt{q(1-q)} = \sqrt{(a+c)(1-(a+c))}$$ \begin{align} \mathrm{Cov}(X, Y) &= \rm{E}[XY] - \rm{E}[X]\rm{E}[Y] \\ &=a - pq \\ &=a - (a+b)(a+c) \\ \end{align} Finally, by substitution into the equation for $\rho_{XY}$: \begin{align} \rho_{XY}&=\frac{\mathrm{Cov}(X,Y)}{\sigma_{X}\sigma_{Y}} \\ &=\frac{a - (a+b)(a+c)}{\sqrt{(a+b)(1-(a+b))}\sqrt{(a+c)(1-(a+c))}} \\ &=\frac{a - (a+b)(a+c)}{\sqrt{(a+b)(1-(a+b))(a+c)(1-(a+c))}} \end{align}

$\endgroup$ 1 $\begingroup$

Hint: The correlation is defined in terms of $\mathrm{Cov}(X,Y)$, $\mathrm{Var}(X)$ and $\mathrm{Var}(Y)$ which can be computed if we know the following quantities $$ {\rm E}[X],{\rm E}[X^2],{\rm E}[Y],{\rm E}[Y^2],{\rm E}[XY]. $$ These should be straightforward to find. For instance, $$ {\rm E}[X^2]={\rm E}[X]=P(X=1)=a+b. $$

$\endgroup$ $\begingroup$

Presumably you're talking about the Pearson correlation coefficient. It's defined in terms of the covariance and the standard deviations. These in turn are defined in terms of expected values, which are defined in terms of probabilities. For example, $E[X] = 1 \cdot P(X=1) + 0 \cdot P(X=0) = a + b$.

$\endgroup$ $\begingroup$

The above answer may be generalized to cover the case where rho is a user selected value and the values $E(x)=E(y)=\mu$ and $V(x)=V(y)=\sigma^2$ are equal and known. In this case:

$E(x)=a+b=E(y)=a+c=\mu$, therefore $b=c$ and $(a+b)*(a+c)=\mu^2$ (note $\mu<1$)

The $a,b,c$,and $d$ values are joint probability values such that:

$a+b+c+d=1$ by definition. So that $d= 1-(a+b+c)$

And by using the definition of the variance for Bernoulli random variables, by setting $V(x)=V(y)=\sigma^2$ the equation below follows:

$V(x)=(a+b)\cdot[1-(a+b)]=V(y)=(a+c)*[1-(a+c)]=\sigma^2=(\mu^2)\cdot(1-\mu)^2$

Then, using the originally derived equations for rho(xy)and Cov(x,y) above, it is possible to define rho in terms of the values of just a and mu, such that:

$\rho= \frac{a - \mu^2}{\mu(1-\mu)}$

Then solving for a we see that all values are known from rho, mu and sigma:

$$a= \mu^2 + [\rho(\mu(1-\mu))]\\ b= \mu - a\\ c= b\\ d= 1-(a+b+c)$$

Two correlated Bernoulli random variables can be simulated by first defining a discrete random variable $Z$ with four values: $3,2,1,0$ with the respective probabilities of $a,b,c,d$ corresponding to the four joint probability states. Then,

$x=1$ when $Z=3$ or $2$, $x=0$ otherwise$y=1$ when $Z=3$ or $1$, $y=0$ otherwise

A Monte Carlo simulation with an Excel add-in then demonstrates that the $x$ and $y$ variables are correlated as specified by rho given the known values of $\mu$ and $\sigma$. The Excel CORREL function for $5000$ samples in a Monte Carlo simulation verifies this. Also, as a cross check when $\rho=0$ you may quickly show that the values of $a,b,c$, and $d$ are properly computed for the case where $x$ and $y$ are independent random variables.

Thanks to all of the original contributors for the work that prompted the above derivation.

$\endgroup$ 2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy