The back propagation isn't working properly in the case 0**0. For example in the code below we have an "nan" which should mathematically not be here.
x = torch.tensor([0., 1.], requires_grad=True)
loss = sum((x)**0)
loss.backward()
print(loss)
print(x.grad)
tensor(2.)
tensor([nan., 0.])
It is probably because the derivative of loss is 0*(x^-1) which is 0/0 is the case x=0. The gradient should be 0. In fact the formula (x^n)' = n*x^(n-1) which is probably used, is false for n=0.
This is a special case but can happen.
Hope it can help
Hv0nnus