PyTorch Tutorial
-NTU Machine Learning Course-
Lyman Lin 林裕訓
Nov. 03, 2017
lymanblue[at]gmail.com
What is PyTorch?
• Developed by Facebook
– Python first
– Dynamic Neural Network
– This tutorial is for PyTorch 0.2.0
• Endorsed by Director of AI at Tesla
Installation
• PyTorch Web: http://pytorch.org/
Packages of PyTorch
Package Description
torch a Tensor library like Numpy, with strong GPU support
torch.autograd a tape based automatic differentiation library that supports all
differentiable Tensor operations in torch
torch.nn a neural networks library deeply integrated with autograd designed for
maximum flexibility
torch.optim an optimization package to be used with torch.nn with standard
optimization methods such as SGD, RMSProp, LBFGS, Adam etc.
torch.multiprocessing python multiprocessing, but with magical memory sharing of torch
Tensors across processes. Useful for data loading and hogwild training.
torch.utils DataLoader, Trainer and other utility functions for convenience
torch.legacy(.nn/.optim) legacy code that has been ported over from torch for backward
compatibility reasons
This Tutorial
Outline
• Neural Network in Brief
• Concepts of PyTorch
• Multi-GPU Processing
• RNN
• Transfer Learning
• Comparison with TensorFlow
Neural Network in Brief
• Supervised Learning
– Learning a function f, that f(x)=y
Data Label
X1 Y1
X2 Y2
… …
Trying to learn f(.), that f(x)=y
Neural Network in Brief
WiData
Neural Network
Big Data
Batch N
Batch 1
Batch 2
Batch 3
…
1 Epoch
N=Big Data/Batch Size
Neural Network in Brief
WiData Label’
Neural Network
Forward
Big Data
Batch N
Batch 1
Batch 2
Batch 3
…
1 Epoch
Forward Process: from data to label
N=Big Data/Batch Size
Neural Network in Brief
WiData Label’
Neural Network
LabelLoss
Forward
Big Data
Batch N
Batch 1
Batch 2
Batch 3
…
1 Epoch
Forward Process: from data to label
N=Big Data/Batch Size
Neural Network in Brief
Wi-> Wi+1Data Label’
Neural Network
LabelLoss
Optimizer
Backward
Forward
Big Data
Batch N
Batch 1
Batch 2
Batch 3
…
1 Epoch
Forward Process: from data to label
Backward Process: update the parameters
N=Big Data/Batch Size
Neural Network in Brief
Wi-> Wi+1Data Label’
Neural Network
LabelLoss
Optimizer
Backward
Forward
W
Forward
Backward
Inside the Neural Network
W W W W…
Data
Label’
Gradient
Neural Network in Brief
Wi-> Wi+1Data Label’
Neural Network
LabelLoss
Optimizer
Backward
Forward
W
Forward
Backward
Inside the Neural Network
W W W W…
Data
Gradient
Data in the Neural Network
- Tensor (n-dim array)
- Gradient of Functions
Label’
Concepts of PyTorch
• Modules of PyTorch
Wi-> Wi+1Data Label’
Neural Network
LabelLoss
Optimizer
Backward
Forward
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch • Similar to Numpy
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch • Operations
– z=x+y
– torch.add(x,y, out=z)
– y.add_(x) # in-place
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch • Numpy Bridge
• To Numpy
– a = torch.ones(5)
– b = a.numpy()
• To Tensor
– a = numpy.ones(5)
– b = torch.from_numpy(a)
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch • CUDA Tensors
• Move to GPU
– x = x.cuda()
– y = y.cuda()
– x+y
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch
Wi-> Wi+1Data Label’
Neural Network
LabelLoss
Optimizer
Backward
Forward
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Neural Network in Brief
Wi-> Wi+1Data Label’
Neural Network
LabelLoss
Optimizer
Backward
Forward
W
Forward
Backward
Inside the Neural Network
W W W W…
Data
Gradient
Data in the Neural Network
- Tensor (n-dim array)
- Gradient of Functions
Label’
Concepts of PyTorch
• Modules of PyTorch • Variable
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Tensor data
For Current Backward Process
Handled by PyTorch Automatically
Concepts of PyTorch
• Modules of PyTorch • Variable
• x = Variable(torch.ones(2, 2), requires_grad=True)
• print(x)
• y = x + 2
• z = y * y * 3
• out = z.mean()
• out.backward()
• print(x.grad)
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
Define modules
(must have)
Build network
(must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
[Channel, H, W]: 1x32x32->6x28x28
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
[Channel, H, W]: 6x28x28
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
[Channel, H, W]: 6x28x28 -> 6x14x14
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
[Channel, H, W]: 6x14x14 -> 16x10x10
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
[Channel, H, W]: 16x10x10
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
[Channel, H, W]: 16x10x10 -> 16x5x5
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Flatten the Tensor
Define modules
(must have)
Build network
(must have)
16x5x5
Tensor: [Batch N, Channel, H, W]
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
conv1
x
relu
pooling
conv2
relu
pooling
fc1
relu
fc2
relu
fc3
Define modules
(must have)
Build network
(must have)
Concepts of PyTorch
• Modules of PyTorch • NN Modules (torch.nn)
– Modules built on Variable
– Gradient handled by PyTorch
• Common Modules
– Convolution layers
– Linear layers
– Pooling layers
– Dropout layers
– Etc…
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
– Example:
– torch.nn.conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
Win
Cin
*: convolution
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
*
1st kernel
*: convolution
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
k=3d=1
s=1, moving step size
p=1
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
k=3d=1
p=1
p=1
s=1, moving step size
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
k=3d=1
p=1
k=3
p=1
s=1, moving step size
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
=
*: convolution
k=3d=1
p=1
k=3
s=1
p=1
s=1, moving step size
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
Cout-th kernel
k
k
Cin
Hout
Wout
1
*
=
=
*: convolution
… …
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
Hout
Wout
1
*
1st kernel
Cout-th kernel
k
k
Cin
Hout
Wout
1
*
=
=
Hout
Wout
Cout
*: convolution
… …
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D
Hin
Input for Conv2d
k
k
Win
Cin
Cin
*
1st kernel
Cout-th kernel
k
k
Cin
*
*: convolution
…
# of parameters
NN Modules
• Linear Layer
– torch.nn.Linear(in_features=3, out_features=5)
– y=Ax+b
NN Modules
• Dropout Layer
– torch.nn.Dropout(p)
– Random zeros the input with probability p
– Output are scaled by 1/(1-p)
If dropout here
NN Modules
• Pooling Layer
– torch.nn.AvgPool2d(kernel_size=2, stride=2, padding=0)
– torch.nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
k=2d=1
k=2
p=0
s=2, moving step size
s=2, moving step size
Concepts of PyTorch
• Modules of PyTorch • NN Modules (torch.nn)
– Modules built on Variable
– Gradient handled by PyTorch
• Common Modules
– Convolution layers
– Linear layers
– Pooling layers
– Dropout layers
– Etc…
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch • Optimizer (torch.optim)
– SGD
– Adagrad
– Adam
– RMSprop
– …
– 9 Optimizers (PyTorch 0.2)
• Loss (torch.nn)
– L1Loss
– MSELoss
– CrossEntropy
– …
– 18 Loss Functions (PyTorch 0.2)
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim
Define modules
(must have)
Build network
(must have)
What We Build?
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim
Define modules
(must have)
Build network
(must have)
What We Build?
…
…
…
D_in=1000
H=100
D_out=100
y_pred
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim
Define modules
(must have)
Build network
(must have)
What We Build?
…
…
…
D_in=1000
H=100
D_out=100
y_pred
Optimizer and Loss Function
Construct Our Model
Don’t Update y (y are labels here)
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim
Define modules
(must have)
Build network
(must have)
…
…
…
D_in=1000
H=100
D_out=100
y_pred
Optimizer and Loss Function
Reset Gradient
Backward
Update Step
Construct Our Model
What We Build?
Don’t Update y (y are labels here)
Concepts of PyTorch
• Modules of PyTorch • Basic Method
– torch.nn.DataParallel
– Recommend by PyTorch
• Advanced Methods
– torch.multiprocessing
– Hogwild (async)
Data:
- Tensor
- Variable (for Gradient)
Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Multi-GPU Processing
• torch.nn.DataParallel
– gpu_id = '6,7‘
– os.environ['CUDA_VISIBLE_DEVICES'] = gpu_id
– net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
– output = net(input_var)
• Important Notes:
– Device_ids must start from 0
– (batch_size/GPU_size) must be integer
Saving Models
• First Approach (Recommend by PyTorch)
• # save only the model parameters
• torch.save(the_model.state_dict(), PATH)
• # load only the model parameters
• the_model = TheModelClass(*args, **kwargs)
• the_model.load_state_dict(torch.load(PATH))
• Second Approach
• torch.save(the_model, PATH) # save the entire model
• the_model = torch.load(PATH) # load the entire model
http://pytorch.org/docs/master/notes/serialization.html#recommended-approach-for-saving-a-model
Recurrent Neural Network (RNN)
http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net
self.i2h
input_size=50+20=70
input
hidden
output
Recurrent Neural Network (RNN)
http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net
self.i2h
input_size=50+20=70
input
hidden
output
Same module (i.e. same parameters)
among the time
Transfer Learning
• Freeze the parameters of original model
– requires_grad = False
• Then add your own modules
http://pytorch.org/docs/master/notes/autograd.html#excluding-subgraphs-from-backward
Comparison with TensorFlow
Properties TensorFlow PyTorch
Graph
Static
Dynamic (TensorFlow Fold)
Dynamic
Ramp-up Time - Win
Graph Creation and Debugging - Win
Feature Coverage Win Catch up quickly
Documentation Tie Tie
Serialization Win (support other lang.) -
Deployment Win (Cloud & Mobile) -
Data Loading - Win
Device Management Win Need .cuda()
Custom Extensions - Win
Summarized from https://awni.github.io/pytorch-tensorflow/
Remind: Platform & Final Project
Thank You~!

PyTorch Tutorial for NTU Machine Learing Course 2017

Editor's Notes

  • #63 http://pytorch.org/docs/master/notes/autograd.html