Linear convergence rate of proximal point algorithm

Question

For $T : R^n \to P({R^n})$ maximally monotone, the proximal point algorithm (step size $c>0$) $$ x^{k+1} = (I + c T)^{-1} x^k, $$ converges linearly with rate $\kappa = \frac{1}{1 + c \sigma}$ if $T$ is strongly monotone with parameter $\sigma > 0$.

I'm interested in analyzing the linear convergence rate in case of matrix-valued step sizes, i.e., $C \succ 0$, $$ x^{k+1} = (I + C T)^{-1} x^k. $$ I could only manage to prove a bound depending on $\lambda_{\text{min}}(C)$, while in practice I numerically observe that the convergence rate depends on the whole spectrum of $C$.

It seems like such a basic algorithm, so I am surprised that I could not find classic literature (e.g. by Rockafellar) on this topic.

Background: many proximal algorithms for solving problems of the form $$ \min_x \max_y~G(x) - F(y) + \langle Kx,y \rangle $$ such as Douglas-Rachford, ADMM or Chambolle-Pock fit the above setting of proximal point algorithms given a special choice of $C$. In case $G$ and $F$ are both strongly convex, $T$ is strongly monotone and my goal is to connect the linear convergence rate to the choice of metric/step size.

Dirk · Accepted Answer · 2017-06-22 07:13:00Z

1

I am not aware of results on the linear rate of this variant of the proximal point method. Let me note that convergence is usually shown by the following observation: Since $C$ is a bijection, you may view the iteration $x^{k+1} = (I+CT)^{-1}x^k$ as a preconditioned proximal point method that solves the preconditioned inclusion $0\in CTx$.

Define $S = CT$ and observe that $S$ is a monotone operator on the same Hilbert space but equipped with the inner product $\langle x,y\rangle_C = \langle C^{-1} x,y\rangle$ (which is the natural inner product for the preconditioned problem). Hence, you immediately get convergence of the method and a rate, but with respect to the $C$-norm $\|x\|_C = \sqrt{\langle x,y\rangle_C}$ (also called energy norm in the context of preconditioners).

answered Jun 22, 2017 at 7:13

Dirk

13.2k7 gold badges58 silver badges102 bronze badges

1

$\begingroup$ I was aware of this observation -- my goal is to "compare" preconditioners, i.e., argue that $C_1$ is a better preconditioner than $C_2$. Do you see any possibility for such an argument? I.e., why would one $C$-norm be better than another? $\endgroup$

yon
– yon

2017-06-22 19:10:07 +00:00
Commented Jun 22, 2017 at 19:10

Add a comment |

Stack Exchange Network

Linear convergence rate of proximal point algorithm

1 Answer 1

You must log in to answer this question.

Linear convergence rate of proximal point algorithm

1 Answer 1

You must log in to answer this question.

Related