I'm new to the domain of statistics and i'm trying to accumulate as much info as i can right now. I've considered that this question should be asked here as it is related to mathematics. The problem is that from the get go most statistics books use the sum of squares formula : $SS= \sum{X^2} - \frac{(\sum(X))^2}{n} .$
Where can i find a proof for this formula? I've tried to prove it myself from $SS= \sum{(X-m_x)} $ , where $m_x$ is the mean of the population $X$ but to no avail.
$\endgroup$ 22 Answers
$\begingroup$The sample variance of data $X_1, X_2, \dots, X_n$ is $$S^2 = \frac{\sum_{i=1}^n(X_i - \bar X)^2}{n-1} = \frac{\sum_i(X_i-\bar X)^2}{n-1} = \frac{\text{SS}_x}{n-1},$$ where
$$\text{SS}_x = \sum_i(X_i - \bar X)^2 = \sum_i(X_i^2 - 2\bar X X_i + \bar X^2) = \sum_i X_i^2 -2\bar X\sum_i X_i + n\bar X^2\\ = \sum_i X_i^2 - \frac2n\left(\sum_i X_i\right)^2 + \frac1n\left(\sum_iX_i\right)^2 = \sum_i X_i^2 - \frac1n\left(\sum_i X_i\right)^2$$ because $\bar X = \frac1n\sum_iX_i.$
Similarly, $S^2 = \frac{\sum_i X_i^2 - n\bar X^2}{n-1}.$
Notes: The formula is of importance because, in a calculator or computer, one may keep track of three 'memories', for $n$, $\sum_iX_i,$ and $\sum_i X_i^2$ as data are entered one at a time, and then (when all data are present) apply the formula to find $S^2.$
Also, the formula makes it possible to find the combined sample variance $S_c^2$ of two samples (of $x$'s and $y$'s) from $n_x, n_y, \bar X, \bar Y, S_x^2,$ and $S_y^2.$
$\endgroup$ 0 $\begingroup$Let us begin with the first sum of n terms: \begin{equation}\Sigma{{\left({x}-{m}_{{x}} \right)}^{2}}={{\left({x}_{1}-{m}_{{x}} \right)}^{2}}+{{\left({x}_{2}-{m}_{{x}} \right)}^{2}}+...+{{\left({x}_{{n}}-{m}_{{x}} \right)}^{2}} \\ ={{x}_{1}^{2}}+{{x}_{2}^{2}}+...+{{x}_{{n}}^{2}}+{n}.{{m}_{{x}}^{2}}-2.{m}_{{x}}{\left({x}_{1}+{x}_{2}+...+{x}_{{n}} \right)} \\ ={{x}_{1}^{2}}+{{x}_{2}^{2}}+...+{{x}_{{n}}^{2}}+{n}.{{m}_{{x}}^{2}}-2.{m}_{{x}}.{n}.{m}_{{x}} \\ ={{x}_{1}^{2}}+{{x}_{2}^{2}}+...+{{x}_{{n}}^{2}}-{n}.{{m}_{{x}}^{2}}\end{equation}Similarly, applying the second formula for n terms yields\begin{equation}\Sigma{\left({x}^{2} \right)}-{\frac{{\left(\Sigma{x} \right)}^{2}}{{n}}}={{x}_{1}^{2}}+{{x}_{2}^{2}}+...+{{x}_{{n}}^{2}}-{\frac{{\left({x}_{1}+{x}_{2}+...+{x}_{{n}} \right)}^{2}}{{n}}} \\ ={{x}_{1}^{2}}+{{x}_{2}^{2}}+...+{{x}_{{n}}^{2}}-{\frac{{\left({n}.{m}_{{x}} \right)}^{2}}{{n}}} \\ ={{x}_{1}^{2}}+{{x}_{2}^{2}}+...+{{x}_{{n}}^{2}}-{n}.{{m}_{{x}}^{2}}\end{equation}As seen above we obtained the same result.
$\endgroup$