a = torch.Tensor(16, 8, 32, 64)
a.view(-1, 32, 64) # works
a.transpose(-1, -2).view(-1, 64, 32) # doesn't
a.view(-1, 32, 64).transpose(-1, -2) # works but doesn't fit some interfaces, i.e. sometimes we want to do view as the last operation
Such view's are needed to implement 4D+ bmm that can treat all dimensions except last two as batch dimensions (similarly to Linear module's behavior). Unless I move transpose inside the bmm func (which would not match the existing interface but well), an extra contiguous call is needed.
Does it make sense to support such view call? On one hand it breaks invariant that view always returns a contiguous tensor, on the other side the situation for several batch dimensions may be common.
Earlier discussed in #764