Stefan Seegerer, hi@stefanseegerer.de Matthias Zürl, matthias.zuerl@fau.de CC-BY-SA Last updated: 10/2021
PyTorch CHEAT SHEET
General
PyTorch is a open source machine learning framework. It uses torch.Tensor – multi-dimensional
matrices – to process. A core feature of neural networks in PyTorch is the autograd package,
which provides automatic derivative calculations for all operations on tensors.
There are several ways to
define a neural network in
PyTorch, e.g. with
nn.Sequential (a), as a
class (b) or using a
combination of both.
import torch
import torch.nn as nn
Root package
Neural networks
import torch.nn.functional as F Collection of layers,
activations & more
from torchvision import
datasets, models, transforms
Popular image datasets,
architectures & transforms
torch.randn(*size)
tnsr.view(a,b, ...)
Create random tensor
Reshape tensor to
size (a, b, ...)
requires_grad=True tracks computation history
for derivative calculations
torch.Tensor(L) Create tensor from list
class Net(nn.Module):
def __init__():
super(Net, self).__init__()
self.conv
= nn.Conv2D( , , )
self.pool
= nn.MaxPool2D( )
self.fc = nn.Linear( , )
def forward(self, x):
return x
model = Net()
x = self.pool(
F.relu(self.conv(x))
)
x = self.fc(x)
x = x.view(-1, )
nn.Conv2D( , , )
nn.MaxPool2D( )
nn.ReLU()
nn.Flatten()
nn.Linear( , )
model = nn.Sequential(
) a
Define model
b
It is common practice to save only the model parameters, not the
whole model using model.state_dict()
Save/Load model
model = torch.load('PATH')
torch.save(model, 'PATH') Save model
Load model
GPU Training
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
If a GPU with CUDA support is available, computations are sent to
the GPU with ID 0 using model.to(device) or
inputs, labels = data[0].to(device), data[1].to(device).
Activation functions
nn.ReLU() or F.relu()
Output between 0 and ∞,
most frequently used activation function
nn.Sigmoid() or F.sigmoid()
Output between 0 and 1,
often used for predicting probabilities
nn.Tanh() or F.tanh()
Output between -1 and 1,
often used for classification with two classes
Common activation functions include ReLU,
Sigmoid and Tanh, but there are other activation
functions as well.
Evaluate model
model.eval() Activates evaluation mode, some layers
behave differently
Prevents tracking history, reduces memory
usage, speeds up calculations
torch.no_grad()
The evaluation examines whether the model provides
satisfactory results on previously withheld data.
Depending on the objective, different metrics are used,
such as acurracy, precision, recall, F1, or BLEU.
Train model
LOSS FUNCTIONS
OPTIMIZATION (torch.optim)
PyTorch already offers a bunch of different
loss fuctions, e.g.:
Optimization algorithms are used to update
weights and dynamically adapt the learning
rate with gradient descent, e.g.:
nn.L1Loss Mean absolute error
Stochastic gradient descent
Adaptive moment estimation
optim.SGD
optim.Adam
Adaptive gradient
Root mean square prop
optim.Adagrad
optim.RMSProp
nn.MSELoss Mean squared error (L2Loss)
nn.CrossEntropyLoss Cross entropy, e.g. for single-label
classification or unbalanced training set
nn.BCELoss Binary cross entropy, e.g. for multi-label
classification or autoencoders
Load data
A dataset is represented by a class that
inherits from Dataset (resembles a list
of tuples of the form (features, label)).
DataLoader allows to load a dataset
without caring about its structure.
Usually the dataset is split into training
(e.g. 80%) and test data (e.g. 20%).
Layers
nn.Linear(m, n): Fully Connected
layer (or dense layer) from
m to n neurons
nn.BatchNormXd(n): Normalizes a X-dimensional
input batch with n features; X {1, 2, 3}
nn.RNN/LSTM/GRU: Recurrent networks
connect neurons of one layer with neurons of the
same or a previous layer
nn.Dropout(p=0.5): Randomly
sets input elements to zero during
training to prevent overfitting
nn.Flatten(): Flattens a contiguous
range of dimensions into a tensor
nn.ConvXd(m, n, s): X-dimensional
convolutional layer from m to n channels
with kernel size s; X {1, 2, 3}
nn.MaxPoolXd(s): X-dimensional pooling
layer with kernel size s; X {1, 2, 3}
torch.nn offers a bunch of other building blocks.
A list of state-of-the-art architectures can be found at https://paperswithcode.com/sota.
nn.Embedding(m, n): Lookup table
to map dictionary of size m to
embedding vector of size n
1 Load data 2 Define model 3 Train model 4 Evaluate model
nn.ReLU() creates a nn.Module for example to be used in
Sequential models. F.relu() ist just a call of the ReLU function
e.g. to be used in the forward method.

pytorch-cheatsheet.pdf for ML study with pythroch

  • 1.
    Stefan Seegerer, hi@stefanseegerer.deMatthias Zürl, matthias.zuerl@fau.de CC-BY-SA Last updated: 10/2021 PyTorch CHEAT SHEET General PyTorch is a open source machine learning framework. It uses torch.Tensor – multi-dimensional matrices – to process. A core feature of neural networks in PyTorch is the autograd package, which provides automatic derivative calculations for all operations on tensors. There are several ways to define a neural network in PyTorch, e.g. with nn.Sequential (a), as a class (b) or using a combination of both. import torch import torch.nn as nn Root package Neural networks import torch.nn.functional as F Collection of layers, activations & more from torchvision import datasets, models, transforms Popular image datasets, architectures & transforms torch.randn(*size) tnsr.view(a,b, ...) Create random tensor Reshape tensor to size (a, b, ...) requires_grad=True tracks computation history for derivative calculations torch.Tensor(L) Create tensor from list class Net(nn.Module): def __init__(): super(Net, self).__init__() self.conv = nn.Conv2D( , , ) self.pool = nn.MaxPool2D( ) self.fc = nn.Linear( , ) def forward(self, x): return x model = Net() x = self.pool( F.relu(self.conv(x)) ) x = self.fc(x) x = x.view(-1, ) nn.Conv2D( , , ) nn.MaxPool2D( ) nn.ReLU() nn.Flatten() nn.Linear( , ) model = nn.Sequential( ) a Define model b It is common practice to save only the model parameters, not the whole model using model.state_dict() Save/Load model model = torch.load('PATH') torch.save(model, 'PATH') Save model Load model GPU Training device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') If a GPU with CUDA support is available, computations are sent to the GPU with ID 0 using model.to(device) or inputs, labels = data[0].to(device), data[1].to(device). Activation functions nn.ReLU() or F.relu() Output between 0 and ∞, most frequently used activation function nn.Sigmoid() or F.sigmoid() Output between 0 and 1, often used for predicting probabilities nn.Tanh() or F.tanh() Output between -1 and 1, often used for classification with two classes Common activation functions include ReLU, Sigmoid and Tanh, but there are other activation functions as well. Evaluate model model.eval() Activates evaluation mode, some layers behave differently Prevents tracking history, reduces memory usage, speeds up calculations torch.no_grad() The evaluation examines whether the model provides satisfactory results on previously withheld data. Depending on the objective, different metrics are used, such as acurracy, precision, recall, F1, or BLEU. Train model LOSS FUNCTIONS OPTIMIZATION (torch.optim) PyTorch already offers a bunch of different loss fuctions, e.g.: Optimization algorithms are used to update weights and dynamically adapt the learning rate with gradient descent, e.g.: nn.L1Loss Mean absolute error Stochastic gradient descent Adaptive moment estimation optim.SGD optim.Adam Adaptive gradient Root mean square prop optim.Adagrad optim.RMSProp nn.MSELoss Mean squared error (L2Loss) nn.CrossEntropyLoss Cross entropy, e.g. for single-label classification or unbalanced training set nn.BCELoss Binary cross entropy, e.g. for multi-label classification or autoencoders Load data A dataset is represented by a class that inherits from Dataset (resembles a list of tuples of the form (features, label)). DataLoader allows to load a dataset without caring about its structure. Usually the dataset is split into training (e.g. 80%) and test data (e.g. 20%). Layers nn.Linear(m, n): Fully Connected layer (or dense layer) from m to n neurons nn.BatchNormXd(n): Normalizes a X-dimensional input batch with n features; X {1, 2, 3} nn.RNN/LSTM/GRU: Recurrent networks connect neurons of one layer with neurons of the same or a previous layer nn.Dropout(p=0.5): Randomly sets input elements to zero during training to prevent overfitting nn.Flatten(): Flattens a contiguous range of dimensions into a tensor nn.ConvXd(m, n, s): X-dimensional convolutional layer from m to n channels with kernel size s; X {1, 2, 3} nn.MaxPoolXd(s): X-dimensional pooling layer with kernel size s; X {1, 2, 3} torch.nn offers a bunch of other building blocks. A list of state-of-the-art architectures can be found at https://paperswithcode.com/sota. nn.Embedding(m, n): Lookup table to map dictionary of size m to embedding vector of size n 1 Load data 2 Define model 3 Train model 4 Evaluate model nn.ReLU() creates a nn.Module for example to be used in Sequential models. F.relu() ist just a call of the ReLU function e.g. to be used in the forward method.