Monotonicity of log determinant of Gaussian kernel matrix

Ask Question

Asked 4 years, 6 months ago

Modified 4 years, 5 months ago

Viewed 381 times

Let \begin{equation} k({x},{y}) = \sigma \exp\left(-\frac{(x-y)^2}{2\theta^2}\right)\end{equation} be a squared-exponential (Gaussian) kernel, with $\sigma,\vartheta>0$. Consider, for a set of $N$ distinct points $x_1,\ldots, x_N \in \mathbb{R}$, the corresponding kernel matrix $\mathbf{K}$, with entries \begin{equation} K_{ij} = k({x}_i,{x}_j). \end{equation} In Carl Edward Rasmussen's book (http://www.gaussianprocess.org/gpml/chapters/RW.pdf, page 113), it is stated that the complexity (penalty) $\log \vert \mathbf{K}\vert$ of a Gaussian process model with kernel matrix $\mathbf{K}$ decreases with the lengthscale, i.e., \begin{equation} \frac{d\log \vert \mathbf{K} \vert}{d\theta} \leq 0. \end{equation} Even though this seems to be common knowledge among people who employ Gaussian processes, I am struggling to prove it, and would like to know how to do so.

Edit: I have tried employing the following:

First, note that $\vert \mathbf{K} \vert = \prod_{n=1}^N \sigma_n(x_n)$, where the elements of $\mathbf{k}_n$ and $\mathbf{K}_n$ are given by $k_{n,i}=k(x_n,x_i)$ and $K_{n,ij}= k({x}_i,{x}_j)$, respectively, i.e., $\sigma_n(x_n)$ denotes the posterior GP variance at $x_n$ with respect to the first $n-1$ data points. This identity follows directly from the Schur determinant formula.

I am able to show that the identity holds if all $\sigma_n(x_n)$ decrease monotonically with the lengthscale, which would imply the desired result. To this end, I am trying to apply the following, which can be found, e.g., in https://arxiv.org/pdf/1704.00445.pdf

$\sigma_n(x_n) = \sigma \langle k(x,\cdot) (\Phi_n \Phi_n^{\top} + \sigma I)^{-1} k(x,\cdot) \rangle_k$, where $\langle \cdot, \cdot \rangle_k$ denotes the inner product of the reproducing kernel Hilbert space (RKHS) with reproducing kernel $k(\cdot,\cdot)$, and $\Phi_n: H_k \rightarrow \mathbb{R}^n$ is a linear operator, specified as $\Phi_n = (k(x_1,\cdot),\ldots,k(x_n,\cdot))^{\top}$.

Now, it can be easily shown that, for two lengthscales $\theta < \tilde{\theta}$ and corresponding kernels $k_{\theta}(\cdot,\cdot)$, $k_{\tilde\theta}(\cdot,\cdot)$, we have \begin{equation} \langle k_{\theta}(x,\cdot) (\Phi_{\theta,n} \Phi_{\theta,n}^{\top} + \sigma I) k_{\theta}(x,\cdot) \rangle_{k_{\theta}} \leq \langle k_{\tilde\theta}(x,\cdot) (\Phi_{\tilde\theta,n} \Phi_{\tilde\theta,n}^{\top} + \sigma I) k_{\tilde\theta}(x,\cdot) \rangle_{k_{\tilde\theta}} . \end{equation}

What I am now wondering is: does this also imply \begin{equation} \langle k_{\theta}(x,\cdot) (\Phi_{\theta,n} \Phi_{\theta,n}^{\top} + \sigma I)^{-1} k_{\theta}(x,\cdot) \rangle_{k_{\theta}} \geq \langle k_{\tilde\theta}(x,\cdot) (\Phi_{\tilde\theta,n} \Phi_{\tilde\theta,n}^{\top} + \sigma I)^{-1} k_{\tilde\theta}(x,\cdot) \rangle_{k_{\tilde\theta}} ? \end{equation}

edited Jul 2, 2021 at 14:33

asked Jun 19, 2021 at 11:39

Heinrich A

556 bronze badges

Add a comment |

0 You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Monotonicity of log determinant of Gaussian kernel matrix

0

You must log in to answer this question.

Related