Hello,
Firstly thank you very much for this great work!
I have a question regarding the loss and gradient computation in the optimizers (PositionOptimizer, VectorOptimizer, DexPilotOptimizer). I noticed that all three optimizers add 2 * norm_delta * (x - last_qpos) to the gradient, which corresponds to the derivative of the scalar regularizer norm_delta * ||x - last_qpos||^2. However, you never add that scalar norm_delta * ||x - last_qpos||^2 to the result returned by the objective (result is computed only from the huber distance term). This makes the objective and gradient inconsistent.
I am wondering if I am missing something or if this is done on purpose, and if so why?
Thank you very much!