-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add tf.stop_gradient in tf.summary.histogram #5311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add tf.stop_gradient in tf.summary.histogram. The function is not differentiable, and building gradient graphs for its conds have been causing some issues. This should be a no-op with a tiny positive performance impact for most users, and will work around an error for those using XLA with multiple/persistent tf.GradientTapes (due to XlaDynamicUpdateSlice not having a gradient defined; there is a separate TensorFlow bug about the op with no gradient, which is the more satisfying medium-term solution here).
stephanwlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change seems logic. Wondering, is this unique to histograms or other Summary APIs as well? Also, would you mind giving test a stab?
we might be able to do a unit test with XLA+CPU in the TensorBoard repo
If this is not true, I am okay with just moving forward as is.
|
Sure, added a test. It's quite weird since summaries can't actually run with XLA CPU/GPU (no outside compilation support), but it does fail before the stop_gradient change. As far as I know the issue is limited to histograms, the combination of a particular slice op and tf.cond. But I haven't looked thoroughly. Like I say, the more satisfying solution is just to define a gradient for the op, this is just to buy some time. |
FWIW, we're trying to simplify this op for TPU-related reasons anyway (#2885) so that would presumably also address the issue if it removes the offending slice + cond combination. |
Add tf.stop_gradient in tf.summary.histogram. The function is not differentiable, and building gradient graphs for its conds have been causing some issues. This should be a no-op with a tiny positive performance impact for most users, and will work around an error for those using XLA with multiple/persistent tf.GradientTapes (due to XlaDynamicUpdateSlice not having a gradient defined; there is a separate TensorFlow bug about the op with no gradient, which is the more satisfying medium-term solution here).
Add tf.stop_gradient in tf.summary.histogram. The function is not differentiable, and building gradient graphs for its conds have been causing some issues. This should be a no-op with a tiny positive performance impact for most users, and will work around an error for those using XLA with multiple/persistent tf.GradientTapes (due to XlaDynamicUpdateSlice not having a gradient defined; there is a separate TensorFlow bug about the op with no gradient, which is the more satisfying medium-term solution here).
Working around a user issue. It's intended as a temporary fix, but keeping it long-term doesn't have any downsides that I know of.
Adds tf.stop_gradient in tf.summary.histogram. The function is not differentiable, and building gradient graphs for its conds have been causing some issues. This should be a no-op with a tiny positive performance impact for most users, and will work around an error for those using XLA with multiple/persistent tf.GradientTapes (due to XlaDynamicUpdateSlice not having a gradient defined; there is a separate TensorFlow bug about the op with no gradient, which is the more satisfying medium-term solution here).
N/A; this just changes the implementation of a summary API.
Two folks (at Alphabet) have confirmed that this works around an issue they reported for now. It's a bit tricky to reproduce; we might be able to do a unit test with XLA+CPU in the TensorBoard repo, although the original report relates to TPU usage.
The op should have a gradient defined, and will eventually.