I am currently relearning econometrics in more depth than I had before. One thing I am trying to make sense of currently is why it is necessary for the assumption of: $$E(u\mid x)=E(u) $$ to be true (where $u$ is the error term).
Here is how I have tried to reason through it, although I am not sure if this is a good reasoning on why.
Let's say $u$ is somehow correlated with some variable $y$, which $x$ is also correlated with. In this case, $$E(u\mid x) \not= E(u) $$ since for greater $x$ values the expectation of the error would go up or down since it is correlated with $x$ through the $y$ variable. With this being the case, the line of best fit would end up with greater or lesser expected errors as $x$ increases and decreases.
Is this what the zero conditional mean assumption is trying to say, or is there a better reasoning that I'm not hitting on?
Thank you!
$\endgroup$3 Answers
$\begingroup$This assumption means that the error $u$ doesn't vary with $x$ in expectation. Often $\mathbb{E}u=0$, so this means that the error is always centered on your prediction.
This is weaker than independence, though, where $\mathbb{E} [f(u)|x]=\mathbb{E}[f(u)]$ for all (measurable) functions $f$.
In particular, if we take $f(u)=(u - \mathbb{E}[u|x])^2=(u-\mathbb{E}u)^2$ it is possible that $\mathbb{E}[f(u)|x] = \operatorname{Var}(u|x)$ can vary with time with this assumption. In a different word: heteroskedasticity.
$\endgroup$ 5 $\begingroup$Recall that you model the conditional expectation, hence if $\mathbb{E}[u|x]=g(x)$
$$
\mathbb{E}[y|x] = \mathbb{E} [a + b x + u|x]=a+bx+g(x),
$$
then $g(x)$ is a part that you should model/approximate. If $g(x) = c$, i.e., a constant, then you can just add it to the intercept, i.e., $y=(a+c)+bx+\epsilon$ and $\mathbb{E}[\epsilon|x]=0$, otherwise you should impose explicit structure on $g(x)$. Hence, the assumption is
$$
\mathbb{E}[u|x]=\mathbb{E}[u]=0,
$$
means that given $x$, if you discard the disturbance $u$, you have a linear model in the parameters. Your main interest is $\mathbb{E}[u|x]$, as you look at the model given $x$ and not just at the error term itself.
The assumption $E(u\mid x)=0$ is a sufficient condition so that estimators like least squares are unbiased. In general such assumptions are made with an eye towards desirabe properties of the estimators.
$\endgroup$