Spectral Norm, Adaptive Softmax, faster CPU ops, anomaly detection (NaNs, etc.), Lots of bug fixes, Python 3.7 and CUDA 9.2 support
Table of Contents
- Breaking Changes
- New Features
- Neural Networks
- Adaptive Softmax, Spectral Norm, etc.
- Operators
- torch.bincount, torch.as_tensor, ...
- torch.distributions
- Half Cauchy, Gamma Sampling, ...
- Other
- Automatic anomaly detection (detecting NaNs, etc.)
- Neural Networks
- Performance
- Faster CPU ops in a wide variety of cases
- Other improvements
- Bug Fixes
- Documentation Improvements
Breaking Changes
torch.stfthas changed its signature to be consistent with librosa #9497- Before:
stft(signal, frame_length, hop, fft_size=None, normalized=False, onesided=True, window=None, pad_end=0) - After:
stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True) torch.stftis also now using FFT internally and is much faster.
- Before:
torch.sliceis removed in favor of the tensor slicing notation #7924torch.arangenow does dtype inference: any floating-point argument is inferred to be the defaultdtype; all integer arguments are inferred to beint64. #7016torch.nn.functional.embedding_bag's old signature embedding_bag(weight, input, ...) is deprecated, embedding_bag(input, weight, ...) (consistent with torch.nn.functional.embedding) should be used insteadtorch.nn.functional.sigmoidandtorch.nn.functional.tanhare deprecated in favor oftorch.sigmoidandtorch.tanh#8748- Broadcast behavior changed in an (very rare) edge case:
[1] x [0]now broadcasts to[0](used to be[1]) #9209
New Features
Neural Networks
-
Adaptive Softmax
nn.AdaptiveLogSoftmaxWithLoss#5287>>> in_features = 1000 >>> n_classes = 200 >>> adaptive_softmax = nn.AdaptiveLogSoftmaxWithLoss(in_features, n_classes, cutoffs=[20, 100, 150]) >>> adaptive_softmax AdaptiveLogSoftmaxWithLoss( (head): Linear(in_features=1000, out_features=23, bias=False) (tail): ModuleList( (0): Sequential( (0): Linear(in_features=1000, out_features=250, bias=False) (1): Linear(in_features=250, out_features=80, bias=False) ) (1): Sequential( (0): Linear(in_features=1000, out_features=62, bias=False) (1): Linear(in_features=62, out_features=50, bias=False) ) (2): Sequential( (0): Linear(in_features=1000, out_features=15, bias=False) (1): Linear(in_features=15, out_features=50, bias=False) ) ) ) >>> batch = 15 >>> input = torch.randn(batch, in_features) >>> target = torch.randint(n_classes, (batch,), dtype=torch.long) >>> # get the log probabilities of target given input, and mean negative log probability loss >>> adaptive_softmax(input, target) ASMoutput(output=tensor([-6.8270, -7.9465, -7.3479, -6.8511, -7.5613, -7.1154, -2.9478, -6.9885, -7.7484, -7.9102, -7.1660, -8.2843, -7.7903, -8.4459, -7.2371], grad_fn=<ThAddBackward>), loss=tensor(7.2112, grad_fn=<MeanBackward1>)) >>> # get the log probabilities of all targets given input as a (batch x n_classes) tensor >>> adaptive_softmax.log_prob(input) tensor([[-2.6533, -3.3957, -2.7069, ..., -6.4749, -5.8867, -6.0611], [-3.4209, -3.2695, -2.9728, ..., -7.6664, -7.5946, -7.9606], [-3.6789, -3.6317, -3.2098, ..., -7.3722, -6.9006, -7.4314], ..., [-3.3150, -4.0957, -3.4335, ..., -7.9572, -8.4603, -8.2080], [-3.8726, -3.7905, -4.3262, ..., -8.0031, -7.8754, -8.7971], [-3.6082, -3.1969, -3.2719, ..., -6.9769, -6.3158, -7.0805]], grad_fn=<CopySlices>) >>> # predit: get the class that maximize log probaility for each input >>> adaptive_softmax.predict(input) tensor([ 8, 6, 6, 16, 14, 16, 16, 9, 4, 7, 5, 7, 8, 14, 3])
-
Add spectral normalization
nn.utils.spectral_norm#6929>>> # Usage is similar to weight_norm >>> convT = nn.ConvTranspose2d(3, 64, kernel_size=3, pad=1) >>> # Can specify number of power iterations applied each time, or use default (1) >>> convT = nn.utils.spectral_norm(convT, n_power_iterations=2) >>> >>> # apply to every conv and conv transpose module in a model >>> def add_sn(m): for name, c in m.named_children(): m.add_module(name, add_sn(c)) if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)): return nn.utils.spectral_norm(m) else: return m >>> my_model = add_sn(my_model)
-
nn.ModuleDictandnn.ParameterDictcontainers #8463 -
Add
nn.init.zeros_andnn.init.ones_#7488 -
Add sparse gradient option to pretrained embedding #7492
-
Add max pooling support to
nn.EmbeddingBag#5725 -
Depthwise convolution support for MKLDNN #8782
-
Add
nn.FeatureAlphaDropout(featurewise Alpha Dropout layer) #9073
Operators
-
torch.bincount(count frequency of each value in an integral tensor) #6688>>> input = torch.randint(0, 8, (5,), dtype=torch.int64) >>> weights = torch.linspace(0, 1, steps=5) >>> input, weights (tensor([4, 3, 6, 3, 4]), tensor([ 0.0000, 0.2500, 0.5000, 0.7500, 1.0000]) >>> torch.bincount(input) tensor([0, 0, 0, 2, 2, 0, 1]) >>> input.bincount(weights) tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000])
-
torch.as_tensor(similar totorch.tensorbut never copies unless necessary) #7109>>> tensor = torch.randn(3, device='cpu', dtype=torch.float32) >>> torch.as_tensor(tensor) # doesn't copy >>> torch.as_tensor(tensor, dtype=torch.float64) # copies due to incompatible dtype >>> torch.as_tensor(tensor, device='cuda') # copies due to incompatible device >>> array = np.array([3, 4.5]) >>> torch.as_tensor(array) # doesn't copy, sharing memory with the numpy array >>> torch.as_tensor(array, device='cuda') # copies due to incompatible device
-
torch.randpermfor CUDA tensors #7606 -
nn.HardShrinkfor CUDA tensors #8117 -
torch.flip(flips a tensor along specified dims) #7873 -
torch.flatten(flattens a contiguous range of dims) #8578 -
torch.pinverse(computes svd-based pseudo-inverse) #9052 -
torch.uniquefor CUDA tensors #8899 -
torch.erfc(complementary error function) https://github.com/pytorch/pytorch/pull/9366/files -
Support backward for target tensor in
torch.nn.functional.kl_div#7839 -
Add batched linear solver to
torch.gesv#6100 -
torch.sumnow supports summing over multiple dimensions https://github.com/pytorch/pytorch/pull/6152/files -
torch.diagonaltorch.diagflatto take arbitrary diagonals with numpy semantics #6718 -
tensor.anyandtensor.allonByteTensorcan now acceptdimandkeepdimarguments #4627
Distributions
- Half Cauchy and Half Normal #8411
- Gamma sampling for CUDA tensors #6855
- Allow vectorized counts in Binomial Distribution #6720
Misc
- Autograd automatic anomaly detection for
NaNand errors occuring in backward. Two functions detect_anomaly and set_detect_anomaly are provided for this. #7677 - Support
reversed(torch.Tensor)#9216 - Support
hash(torch.device)#9246 - Support
gzipintorch.load#6490
Performance
- Accelerate bernoulli number generation on CPU #7171
- Enable cuFFT plan caching (80% speed-up in certain cases) #8344
- Fix unnecessary copying in
bernoulli_#8682 - Fix unnecessary copying in
broadcast#8222 - Speed-up multidim
sum(2x~6x speed-up in certain cases) #8992 - Vectorize CPU
sigmoid(>3x speed-up in most cases) #8612 - Optimize CPU
nn.LeakyReLUandnn.PReLU(2x speed-up) #9206 - Vectorize
softmaxandlogsoftmax(4.5x speed-up on single core and 1.8x on 10 threads) #7375 - Speed up
nn.init.sparse(10-20x speed-up) #6899
Improvements
Tensor printing
- Tensor printing now includes
requires_gradandgrad_fninformation #8211 - Improve number formatting in tensor print #7632
- Fix scale when printing some tensors #7189
- Speed up printing of large tensors #6876
Neural Networks
NaNis now propagated through many activation functions #8033- Add
non_blockingoption to nn.Module.to #7312 - Loss modules now allow target to require gradient #8460
- Add
pos_weightargument tonn.BCEWithLogitsLoss#6856 - Support
grad_clipfor parameters on different devices #9302 - Removes the requirement that input sequences to
pad_sequencehave to be sorted #7928 strideargument formax_unpool1d,max_unpool2d,max_unpool3dnow defaults tokernel_size#7388- Allowing calling grad mode context managers (e.g.,
torch.no_grad,torch.enable_grad) as decorators #7737 torch.optim.lr_scheduler._LRSchedulers__getstate__include optimizer info #7757- Add support for accepting
Tensoras input inclip_grad_*functions #7769 - Return
NaNinmax_pool/adaptive_max_poolforNaNinputs #7670 nn.EmbeddingBagcan now handle empty bags in all modes #7389torch.optim.lr_scheduler.ReduceLROnPlateauis now serializable #7201- Allow only tensors of floating point dtype to require gradients #7034 and #7185
- Allow resetting of BatchNorm running stats and cumulative moving average #5766
- Set the gradient of
LP-Pooling to zero if the sum of all input elements to the power of p is zero #6766
Operators
- Add ellipses ('...') and diagonals (e.g. 'ii→i') to
torch.einsum#7173 - Add
tomethod forPackedSequence#7319 - Add support for
__floordiv__and__rdiv__for integral tensors #7245 torch.clampnow has subgradient 1 at min and max #7049torch.arangenow uses NumPy-style type inference: #7016- Support infinity norm properly in
torch.normandtorch.renorm#6969 - Allow passing an output tensor via
out=keyword arugment intorch.dotandtorch.matmul#6961
Distributions
- Always enable grad when calculating
lazy_property#7708
Sparse Tensor
Data Parallel
- Allow modules that return scalars in
nn.DataParallel#7973 - Allow
nn.parallel.parallel_applyto take in a list/tuple of tensors #8047
Misc
torch.Sizecan now accept PyTorch scalars #5676- Move
torch.utils.data.dataset.random_splitto torch.utils.data.random_split, andtorch.utils.data.dataset.Subsettotorch.utils.data.Subset#7816 - Add serialization for
torch.device#7713 - Allow copy.deepcopy of
torch.(int/float/...)*dtype objects #7699 torch.loadcan now take atorch.deviceas map location #7339
Bug Fixes
- Fix
nn.BCELosssometimes returning negative results #8147 - Fix
tensor._indiceson scalar sparse tensor giving wrong result #8197 - Fix backward of
tensor.as_stridednot working properly when input has overlapping memory #8721 - Fix
x.pow(0)gradient when x contains 0 #8945 - Fix CUDA
torch.svdandtorch.eigreturning wrong results in certain cases #9082 - Fix
nn.MSELosshaving low precision #9287 - Fix segmentation fault when calling
torch.Tensor.grad_fn#9292 - Fix
torch.topkreturning wrong results when input isn't contiguous #9441 - Fix segfault in convolution on CPU with large
inputs/dilation#9274 - Fix
avg_pool2/3dcount_include_padhaving default valueFalse(should beTrue) #8645 - Fix
nn.EmbeddingBag'smax_normoption #7959 - Fix returning scalar input in Python autograd function #7934
- Fix THCUNN
SpatialDepthwiseConvolutionassuming contiguity #7952 - Fix bug in seeding random module in
DataLoader#7886 - Don't modify variables in-place for
torch.einsum#7765 - Make return uniform in lbfgs step #7586
- The return value of
uniform.cdf()is now clamped to[0..1]#7538 - Fix advanced indexing with negative indices #7345
CUDAGeneratorwill not initialize on the current device anymore, which will avoid unnecessary memory allocation onGPU:0#7392- Fix
tensor.type(dtype)not preserving device #7474 - Batch sampler should return the same results when used alone or in dataloader with
num_workers> 0 #7265 - Fix broadcasting error in LogNormal, TransformedDistribution #7269
- Fix
torch.maxandtorch.minon CUDA in presence ofNaN#7052 - Fix
torch.tensordevice-type calculation when used with CUDA #6995 - Fixed a missing
'='innn.LPPoolNdrepr function #9629
Documentation
- Expose and document
torch.autograd.gradcheckandtorch.autograd.gradgradcheck#8166 - Document
tensor.scatter_add_#9630 - Document variants of
torch.addandtensor.add_, e.g.tensor.add(value=1, other)-> Tensor #9027 - Document
torch.logsumexp#8428 - Document
torch.sparse_coo_tensor#8152 - Document
torch.utils.data.dataset.random_split#7676 - Document
torch.nn.GroupNorm#7086 - A lot of other various documentation improvements including RNNs,
ConvTransposeNd,Fold/Unfold,Embedding/EmbeddingBag, Loss functions, etc.