Justification of the use of residual plot

Question

$\DeclareMathOperator\Cov{Cov}$Backround of my Question

Let $Y$ be the response variable, $\mathbb{X}$ be the explanatory variables. The ultimate goal of prediction is finding a function $f^{*}$ that minimize $\mathbb{E}[(Y - f^{*}(\mathbb{X})^2)]$, we know that the solution is $f^{*}(\mathbb{X}) = \mathbb{E}[Y | \mathbb{X}]$. Let $\epsilon = Y - \mathbb{E}[Y | \mathbb{X}]$, then we have $$ Y = f^{*}(\mathbb{X}) + \epsilon $$ where $\mathbb{E}[\mathbb{\epsilon}] = 0$, and $\Cov[\mathbb{X}, \epsilon] = 0$.

So, I think the goal of regression somehow become finding a function $g \approx f^*$ from some hypothesis space (e.g. $g(\mathbb{X}) = \mathbb{X}\beta$ for linear regression).

One way of defining $g \approx f^*$ (which can persuade myself that it is a good approximation) is $$ \mathbb{P}[|f^{*}(\mathbb{X}) - g(\mathbb{X})| > \eta] < \delta $$ for small $\eta$ and $\delta$.

Question

Given that $g \approx f^{*}$ (in the sense defined above), if I can show that $\Cov[\widehat{Y}, \widehat{\epsilon}] \approx 0$ ($\widehat{Y} = g(\mathbb{X})$, and $\widehat{\epsilon} = Y - g(\mathbb{X})$), then when I see a residual plot like this Residual Plot

I can persuade myself this might be a signal suggesting that $g(\mathbb{X})$ probably a good approximation.

So, my question is how to show that $g \approx f^* \Longrightarrow \Cov[\widehat{Y}, \widehat{\epsilon}] \approx 0$?

My Attemp

If further assume that $\mathbb{X}$ and $\epsilon$ are independent (is $\Cov[\mathbb{X}, \epsilon] = 0$ sufficient?), then $\Cov[f^{*}\mathbb(X), \epsilon] = 0$, hence

\begin{equation} \begin{aligned} |\Cov[\widehat{Y}, \widehat{\epsilon}]| &= |\Cov[f^*(\mathbb{X}) + (g(\mathbb{X}) - f^*\mathbb(X)), (f^*(\mathbb{X}) - g(\mathbb{X})) + \epsilon]| \\ &\leq |\Cov[f^{*}(\mathbb{X}), f^{*}(\mathbb{X}) - g(\mathbb{X})]| + |\Cov[g(\mathbb{X}) - f^{*}(\mathbb{X}), f^{*}(\mathbb{X}) - g(\mathbb{X})]| + |\Cov[g(\mathbb{X}) - f^{*}(\mathbb{X}), \epsilon]| \\ & \leq |\Cov[f^{*}(\mathbb{X}), f^{*}(\mathbb{X}) - g(\mathbb{X})]| + |\Cov[g(\mathbb{X}) - f^{*}(\mathbb{X}), \epsilon]| + |Var[f^*(\mathbb{X}) - g(\mathbb{X})]| \end{aligned} \end{equation} Intuitively, since $f^{*}(\mathbb{X}) - g(\mathbb{X}) \approx 0$ with high probability, the last three terms may be small (depend on $\epsilon$ and $\delta$). But I don't know how to do it formally.

Edit

By @losifpinelis 's construction, $\exists (Y, \mathbb{X})$, for any given $\eta, \delta, M > 0$, there exists a function $g_N$ such that $$ \mathbb{P}[|f^{*}(\mathbb{X}) - g_N(\mathbb{X})| > \eta] < \delta $$ but $|\Cov[g_N(\mathbb{X}), \epsilon_N]| > M$. Therefore, this is not a proper definition for this problem.

My next Question is "Is $\mathbb{E}[(f^{*}(\mathbb{X}) - g(\mathbb{X}))^2] < \eta$ a proper definition of goodness of approximation?". That is, does $$ \mathbb{E}[(f^{*}(\mathbb{X}) - g(\mathbb{X}))^2] < \eta \Longrightarrow \Cov[\widehat{Y}, \widehat{\epsilon}] < SomeFunction(\eta) $$ hold?

@losifPinelis I want to show $\widehat{Y}$ and $\widehat{\epsilon}$ are nearly uncorrelated, but I don't know how to do it. — Cheng-Yu
– Cheng-Yu, Commented Jan 5, 2023 at 22:55
@losif Thanks for answering. I've edited the post. I hope it makes my question more clear. — Cheng-Yu
– Cheng-Yu, Commented Jan 5, 2023 at 23:15

Iosif Pinelis · Accepted Answer · 2023-01-06 04:52:10Z

$\newcommand\ep\epsilon\newcommand{\de}{\delta}$Getting rid of the instances of $\approx$, one can state the question as follows:

Let $f(X):=E(Y|X)$, where $X$ and $Y$ are random variables (r.v.'s) ($X$ possibly a multivariate one) such that $EY^2<\infty$. Note that $E\ep=0$ and $Cov(f(X),\ep)=0$, where $\ep:=Y-f(X)$.

Suppose that for a sequence $(g_n)$ of Borel-measurable functions one has $g_n(X)\to f(X)$ in probability (as $n\to\infty$). Does it then follow that \begin{equation*} Cov(g_n(X),\ep_n)\to0, \tag{1}\label{1} \end{equation*} where $\ep_n:=Y-g_n(X)$?

The answer to this question is: Of course, not.

Indeed, let e.g. $X$ be a r.v. uniformly distributed on the interval $[-1,1]$, and let $Y:=X$, so that $f(X)=X$ and $\ep=0$.

Let \begin{equation*} g_n(X):=X\,1(|X|\ge1/n)+n^2 X\,1(|X|<1/n). \end{equation*} Then $g_n(X)\to X=f(X)$ in probability.

However, $Eg_n(X)=0$, $\ep_n=Y-g_n(X)=X-g_n(X)=(1-n^2)X\,1(|X|<1/n)$, and hence \begin{equation*} Cov(g_n(X),\ep_n)=Eg_n(X)\ep_n=n^2(1-n^2)\,EX^2\,1(|X|<1/n) \\ =(1-n^2)/(3n)\to-\infty\ne0. \end{equation*} So, \eqref{1} fails to hold. $\quad\Box$

On the other hand, if $g_n(X)\to f(X)$ in $L^2$, then it is easy to see that \eqref{1} will hold.

Details on this: We have $\|g_n(X)-f(X)\|_2\to0$, where $\|Z\|_2:=\sqrt{EZ^2}$. We also have the identity \begin{equation*} g_n(X)\ep_n=g_n(X)(Y-g_n(X))=f(X)(Y-f(X))+Y(g_n(X)-f(X))+2f(X)(f(X)-g_n(X))-(g_n(X)-f(X))^2. \end{equation*} Taking here the expectations and recalling that $Ef(X)(Y-f(X))=Ef(X)\ep=Cov(f(X),\ep)=0$, we get \begin{equation*} Eg_n(X)\ep_n=EY(g_n(X)-f(X))+2Ef(X)(f(X)-g_n(X))-E(g_n(X)-f(X))^2. \tag{2}\label{2} \end{equation*} By the Cauchy--Schwarz inequality and the condition $\|g_n(X)-f(X)\|_2\to0$, \begin{equation*} |EY(g_n(X)-f(X))|\le\|Y\|_2\,\|g_n(X)-f(X)\|_2\to0. \tag{3}\label{3} \end{equation*} Note all that \begin{equation} \|f(X)\|_2\le\|Y\|_2<\infty \tag{3.5}\label{3.5} \end{equation} since $f(X)$ is an orthogonal projection of $Y$ in $L^2$. So, \begin{equation*} |Ef(X)(f(X)-g_n(X))|\le\|f(X)\|_2\,\|g_n(X)-f(X)\|_2\to0. \tag{4}\label{4} \end{equation*} Also, \begin{equation*} E(g_n(X)-f(X))^2=\|g_n(X)-f(X)\|_2^2\to0. \tag{5}\label{5} \end{equation*} Collecting \eqref{2}--\eqref{5}, we get \begin{equation*} Eg_n(X)\ep_n\to0. \tag{6}\label{6} \end{equation*} Next, $\ep_n=Y-g_n(X)=Y-f(X)+f(X)-g_n(X)=\ep+f(X)-g_n(X)$. Taking here the expectations and recalling that $E\ep=0$, we get \begin{equation} |E\ep_n|=|E(f(X)-g_n(X))|\le\|g_n(X)-f(X)\|_2\to0. \tag{7}\label{7} \end{equation} Further, $|Eg_n(X)-Ef(X)|\le E|g_n(X)-Ef(X)|\le\|g_n(X)-Ef(X)\|_2\to0$, again by the Cauchy--Schwarz inequality and the condition $\|g_n(X)-f(X)\|_2\to0$. So, \begin{equation} Eg_n(X)\to Ef(X), \tag{8}\label{8} \end{equation} and $|Ef(X)|\le\|f(X)\|_2<\infty$ by the Cauchy--Schwarz inequality and \eqref{3.5}. By \eqref{6}--\eqref{8}, \begin{equation} Cov(g_n(X),\ep_n)=Eg_n(X)\ep_n-Eg_n(X)\,E\ep_n\to0-Ef(X)\times 0=0, \end{equation}

In view of \eqref{3}, \eqref{3.5}, \eqref{4}, \eqref{5}, and \eqref{7}, one can also get an explicit bound on $|Cov(g_n(X),\ep_n)|$: \begin{equation} |Cov(g_n(X),\ep_n)|\le3y\de_n+2\de_n^2, \end{equation} where $y:=\|Y\|_2$ and $\de_n:=\|g_n(X)-f(X)\|_2$.

Thanks for your clear explanation. It seems that the critical step to show the second result (converge in $L^2$), is showing that $\mathbb{E}[(g_n(X) - f(X))^2] < \eta \Longrightarrow |Cov[g_n(X), \epsilon_n]| < $ some_function($\eta$). So, if I define the goodness of approximation as $\mathbb{E}[(g_n(X) - f(X))^2] < \eta$, I can get a positive result? — Cheng-Yu
– Cheng-Yu, Commented Jan 6, 2023 at 2:03
And, would you mind elaborating how this step could be done? It seems not that easy to me. — Cheng-Yu
– Cheng-Yu, Commented Jan 6, 2023 at 2:13
@losif Thank you very much! This answer is very helpful. I'm amazed by your problem solving skills. It's a shame that I don't have enough reputations to upvote your answer. — Cheng-Yu
– Cheng-Yu, Commented Jan 6, 2023 at 6:12

Stack Exchange Network

Justification of the use of residual plot

1 Answer 1

You must log in to answer this question.

Justification of the use of residual plot

1 Answer 1

You must log in to answer this question.

Related