2
$\begingroup$

$\DeclareMathOperator\Cov{Cov}$Backround of my Question

Let $Y$ be the response variable, $\mathbb{X}$ be the explanatory variables. The ultimate goal of prediction is finding a function $f^{*}$ that minimize $\mathbb{E}[(Y - f^{*}(\mathbb{X})^2)]$, we know that the solution is $f^{*}(\mathbb{X}) = \mathbb{E}[Y | \mathbb{X}]$. Let $\epsilon = Y - \mathbb{E}[Y | \mathbb{X}]$, then we have $$ Y = f^{*}(\mathbb{X}) + \epsilon $$ where $\mathbb{E}[\mathbb{\epsilon}] = 0$, and $\Cov[\mathbb{X}, \epsilon] = 0$.

So, I think the goal of regression somehow become finding a function $g \approx f^*$ from some hypothesis space (e.g. $g(\mathbb{X}) = \mathbb{X}\beta$ for linear regression).

One way of defining $g \approx f^*$ (which can persuade myself that it is a good approximation) is $$ \mathbb{P}[|f^{*}(\mathbb{X}) - g(\mathbb{X})| > \eta] < \delta $$ for small $\eta$ and $\delta$.

Question

Given that $g \approx f^{*}$ (in the sense defined above), if I can show that $\Cov[\widehat{Y}, \widehat{\epsilon}] \approx 0$ ($\widehat{Y} = g(\mathbb{X})$, and $\widehat{\epsilon} = Y - g(\mathbb{X})$), then when I see a residual plot like this Residual Plot

I can persuade myself this might be a signal suggesting that $g(\mathbb{X})$ probably a good approximation.

So, my question is how to show that $g \approx f^* \Longrightarrow \Cov[\widehat{Y}, \widehat{\epsilon}] \approx 0$?

My Attemp

If further assume that $\mathbb{X}$ and $\epsilon$ are independent (is $\Cov[\mathbb{X}, \epsilon] = 0$ sufficient?), then $\Cov[f^{*}\mathbb(X), \epsilon] = 0$, hence

\begin{equation} \begin{aligned} |\Cov[\widehat{Y}, \widehat{\epsilon}]| &= |\Cov[f^*(\mathbb{X}) + (g(\mathbb{X}) - f^*\mathbb(X)), (f^*(\mathbb{X}) - g(\mathbb{X})) + \epsilon]| \\ &\leq |\Cov[f^{*}(\mathbb{X}), f^{*}(\mathbb{X}) - g(\mathbb{X})]| + |\Cov[g(\mathbb{X}) - f^{*}(\mathbb{X}), f^{*}(\mathbb{X}) - g(\mathbb{X})]| + |\Cov[g(\mathbb{X}) - f^{*}(\mathbb{X}), \epsilon]| \\ & \leq |\Cov[f^{*}(\mathbb{X}), f^{*}(\mathbb{X}) - g(\mathbb{X})]| + |\Cov[g(\mathbb{X}) - f^{*}(\mathbb{X}), \epsilon]| + |Var[f^*(\mathbb{X}) - g(\mathbb{X})]| \end{aligned} \end{equation} Intuitively, since $f^{*}(\mathbb{X}) - g(\mathbb{X}) \approx 0$ with high probability, the last three terms may be small (depend on $\epsilon$ and $\delta$). But I don't know how to do it formally.

Edit

By @losifpinelis 's construction, $\exists (Y, \mathbb{X})$, for any given $\eta, \delta, M > 0$, there exists a function $g_N$ such that $$ \mathbb{P}[|f^{*}(\mathbb{X}) - g_N(\mathbb{X})| > \eta] < \delta $$ but $|\Cov[g_N(\mathbb{X}), \epsilon_N]| > M$. Therefore, this is not a proper definition for this problem.

My next Question is "Is $\mathbb{E}[(f^{*}(\mathbb{X}) - g(\mathbb{X}))^2] < \eta$ a proper definition of goodness of approximation?". That is, does $$ \mathbb{E}[(f^{*}(\mathbb{X}) - g(\mathbb{X}))^2] < \eta \Longrightarrow \Cov[\widehat{Y}, \widehat{\epsilon}] < SomeFunction(\eta) $$ hold?

$\endgroup$
3
  • $\begingroup$ What is the question here? $\endgroup$ Commented Jan 5, 2023 at 22:52
  • $\begingroup$ @losifPinelis I want to show $\widehat{Y}$ and $\widehat{\epsilon}$ are nearly uncorrelated, but I don't know how to do it. $\endgroup$ Commented Jan 5, 2023 at 22:55
  • $\begingroup$ @losif Thanks for answering. I've edited the post. I hope it makes my question more clear. $\endgroup$ Commented Jan 5, 2023 at 23:15

1 Answer 1

2
$\begingroup$

$\newcommand\ep\epsilon\newcommand{\de}{\delta}$Getting rid of the instances of $\approx$, one can state the question as follows:

Let $f(X):=E(Y|X)$, where $X$ and $Y$ are random variables (r.v.'s) ($X$ possibly a multivariate one) such that $EY^2<\infty$. Note that $E\ep=0$ and $Cov(f(X),\ep)=0$, where $\ep:=Y-f(X)$.

Suppose that for a sequence $(g_n)$ of Borel-measurable functions one has $g_n(X)\to f(X)$ in probability (as $n\to\infty$). Does it then follow that \begin{equation*} Cov(g_n(X),\ep_n)\to0, \tag{1}\label{1} \end{equation*} where $\ep_n:=Y-g_n(X)$?

The answer to this question is: Of course, not.

Indeed, let e.g. $X$ be a r.v. uniformly distributed on the interval $[-1,1]$, and let $Y:=X$, so that $f(X)=X$ and $\ep=0$.

Let \begin{equation*} g_n(X):=X\,1(|X|\ge1/n)+n^2 X\,1(|X|<1/n). \end{equation*} Then $g_n(X)\to X=f(X)$ in probability.

However, $Eg_n(X)=0$, $\ep_n=Y-g_n(X)=X-g_n(X)=(1-n^2)X\,1(|X|<1/n)$, and hence \begin{equation*} Cov(g_n(X),\ep_n)=Eg_n(X)\ep_n=n^2(1-n^2)\,EX^2\,1(|X|<1/n) \\ =(1-n^2)/(3n)\to-\infty\ne0. \end{equation*} So, \eqref{1} fails to hold. $\quad\Box$


On the other hand, if $g_n(X)\to f(X)$ in $L^2$, then it is easy to see that \eqref{1} will hold.

Details on this: We have $\|g_n(X)-f(X)\|_2\to0$, where $\|Z\|_2:=\sqrt{EZ^2}$. We also have the identity \begin{equation*} g_n(X)\ep_n=g_n(X)(Y-g_n(X))=f(X)(Y-f(X))+Y(g_n(X)-f(X))+2f(X)(f(X)-g_n(X))-(g_n(X)-f(X))^2. \end{equation*} Taking here the expectations and recalling that $Ef(X)(Y-f(X))=Ef(X)\ep=Cov(f(X),\ep)=0$, we get \begin{equation*} Eg_n(X)\ep_n=EY(g_n(X)-f(X))+2Ef(X)(f(X)-g_n(X))-E(g_n(X)-f(X))^2. \tag{2}\label{2} \end{equation*} By the Cauchy--Schwarz inequality and the condition $\|g_n(X)-f(X)\|_2\to0$, \begin{equation*} |EY(g_n(X)-f(X))|\le\|Y\|_2\,\|g_n(X)-f(X)\|_2\to0. \tag{3}\label{3} \end{equation*} Note all that \begin{equation} \|f(X)\|_2\le\|Y\|_2<\infty \tag{3.5}\label{3.5} \end{equation} since $f(X)$ is an orthogonal projection of $Y$ in $L^2$. So, \begin{equation*} |Ef(X)(f(X)-g_n(X))|\le\|f(X)\|_2\,\|g_n(X)-f(X)\|_2\to0. \tag{4}\label{4} \end{equation*} Also, \begin{equation*} E(g_n(X)-f(X))^2=\|g_n(X)-f(X)\|_2^2\to0. \tag{5}\label{5} \end{equation*} Collecting \eqref{2}--\eqref{5}, we get \begin{equation*} Eg_n(X)\ep_n\to0. \tag{6}\label{6} \end{equation*} Next, $\ep_n=Y-g_n(X)=Y-f(X)+f(X)-g_n(X)=\ep+f(X)-g_n(X)$. Taking here the expectations and recalling that $E\ep=0$, we get \begin{equation} |E\ep_n|=|E(f(X)-g_n(X))|\le\|g_n(X)-f(X)\|_2\to0. \tag{7}\label{7} \end{equation} Further, $|Eg_n(X)-Ef(X)|\le E|g_n(X)-Ef(X)|\le\|g_n(X)-Ef(X)\|_2\to0$, again by the Cauchy--Schwarz inequality and the condition $\|g_n(X)-f(X)\|_2\to0$. So, \begin{equation} Eg_n(X)\to Ef(X), \tag{8}\label{8} \end{equation} and $|Ef(X)|\le\|f(X)\|_2<\infty$ by the Cauchy--Schwarz inequality and \eqref{3.5}. By \eqref{6}--\eqref{8}, \begin{equation} Cov(g_n(X),\ep_n)=Eg_n(X)\ep_n-Eg_n(X)\,E\ep_n\to0-Ef(X)\times 0=0, \end{equation}


In view of \eqref{3}, \eqref{3.5}, \eqref{4}, \eqref{5}, and \eqref{7}, one can also get an explicit bound on $|Cov(g_n(X),\ep_n)|$: \begin{equation} |Cov(g_n(X),\ep_n)|\le3y\de_n+2\de_n^2, \end{equation} where $y:=\|Y\|_2$ and $\de_n:=\|g_n(X)-f(X)\|_2$.

$\endgroup$
5
  • $\begingroup$ Thanks for your clear explanation. It seems that the critical step to show the second result (converge in $L^2$), is showing that $\mathbb{E}[(g_n(X) - f(X))^2] < \eta \Longrightarrow |Cov[g_n(X), \epsilon_n]| < $ some_function($\eta$). So, if I define the goodness of approximation as $\mathbb{E}[(g_n(X) - f(X))^2] < \eta$, I can get a positive result? $\endgroup$ Commented Jan 6, 2023 at 2:03
  • $\begingroup$ And, would you mind elaborating how this step could be done? It seems not that easy to me. $\endgroup$ Commented Jan 6, 2023 at 2:13
  • $\begingroup$ @Cheng-Yu : I have added details on the $L^2$ thing. $\endgroup$ Commented Jan 6, 2023 at 4:28
  • $\begingroup$ @losif Thank you very much! This answer is very helpful. I'm amazed by your problem solving skills. It's a shame that I don't have enough reputations to upvote your answer. $\endgroup$ Commented Jan 6, 2023 at 6:12
  • $\begingroup$ @Cheng-Yu : Thank you for your appreciation. $\endgroup$ Commented Jan 6, 2023 at 13:21

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.