Image Segmentation with
Deep Learning
Antonio Rueda-Toicen and Imran Kocabiyik
Berlin Computer Vision Group
December 2020
https://www.meetup.com/Berlin-Computer-Vision-Group/
Agenda
● Image segmentation
■ Semantic segmentation
● Fully convolutional networks, U-net
■ Instance segmentation
● Mask R-CNN
■ Panoptic segmentation
● Feature Pyramid Networks
○ Public datasets
■ COCO
■ Google Open Images
○ Implementations: Detectron2, Fast.ai
Classification, detection, and segmentation
Classification refers to image-wide labels
Detection refers to localization of bounding boxes with labels
Segmentation refers to pixel-wise localization of the labels
Goals of supervised image segmentation
Given an input image we wish to obtain:
1. A class label associated to each individual pixel in the image. This is also called pixel-wise
localization.
3. The probability score associated with each class label
Applications of image segmentation
Link
Applications of image segmentation
http://withoutbg.com/
Applications of image segmentation
Applications of image segmentation
Applications of image segmentation
Applications of image segmentation
https://www.segmentive.ai/
Segmentation as pixel-wise localization
Instance segmentation requires object detection
Panoptic segmentation
https://arxiv.org/pdf/1801.00868.pdf
Explore it in the detectron2 inference notebook
“Fully Convolutional” networks draw segmentation
masks
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we use the local info of the pixel neighborhood
What is a convolution filter?
https://setosa.io/ev/image-kernels/
What is a convolution filter?
https://setosa.io/ev/image-kernels/
What is a convolution filter?
Convolution of 3x3 and stride = 1 without padding
Effect: the output loses one pixel on each dimension
What is a convolution filter?
Convolution of 3x3 and stride = 1 with zero padding
Effect: the output preserves original image size
What is a convolution filter?
Convolution of 3x3 and stride = 2 with zero padding
Effect: the output is downsampled to about half its size
“Fully Convolutional” networks draw segmentation
masks
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we use the local info of the pixel neighborhood
U-net for semantic segmentation
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we need this fully convolutional architecture to label images pixel by pixel preserving their
local info
U-net for semantic segmentation
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we need this fully convolutional architecture to label images pixel by pixel preserving their
local info
Image pyramids
Image Pyramids in Feature Proposal Networks
(FPNs)
Convolutional networks implement “pyramids”
The deeper we go into the network, the more semantic value is compressed in lower x,y dimensions
Resnets
Nearest neighbor interpolation
Resnets in feature pyramid networks
1x1 convolution
Resnets in feature pyramid networks
Feature Pyramid Networks
Image Pyramids in Feature Proposal Networks
(FPNs)
Mask R-CNN
The COCO dataset
http://cocodataset.org/#explore
The Google Open Images Dataset
https://storage.googleapis.com/openimages/web/index.html
https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=segmentation&r=false&c=%2Fm%2F03g8mr
https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=detection&c=%2Fm%2F04rmv
https://cocodataset.org/#explore
Detectron2
detectron2/MODEL_ZOO.md at master · facebookresearch/detectron2 · GitHub
Inference (Colab notebook)
Training (Colab notebook)
Generating validation set plots
Panoptic segmentation with feature pyramid network (FPN-50)
Detectron2 config files
https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
Model output format
https://detectron2.readthedocs.io/tutorials/model
s.html#model-output-format
Objective
Example Case: Image Matting
⊕
Using a Unet
Example Case: Image Matting
Matting algorithm:
Example Case: Image Matting
instance segment
Using trimap or instance segments?
Example Case: Image Matting
⊕ or ⊕ ?
Results
Example Case: Image Matting
Photo: Ayo Ogunseinde
https://unsplash.com/photos/THIs-cpyebg
Results
Example Case: Image Matting
Photo: Eugen Proskouriakov
https://unsplash.com/photos/C-gvAA8q3Tc
Results
Example Case: Image Matting
Photo: Mathieu Renier
https://unsplash.com/photos/4WBvCqeMaDE
Results
Example Case: Image Matting
Photo: Gulyás Bianka
https://unsplash.com/photos/3WOh54znPGU
For more examples:
withoutbg.com
Example Case: Image Matting
Which things should be kept in this picture?
Kid, ball, 2 dogs, 9 people?
Example Case: Image Matting
Photo: Treddy Chen
https://unsplash.com/photos/UdQWvefOXJk
Issue: When there is more than one person in the image...
Example Case: Image Matting
Review questions
- How do we compute the confusion matrix for a segmentation mask? How do we
compute it for a bounding box?
- Can we use the Intersection over Union equation to evaluate the quality of a
segmentation mask?
- What’s the recall of a classifier that only outputs ‘1’ (positive class)?
- What’s the precision of a classifier that outputs a single true positive, with all its
other predictions being equal to ‘0’ (negative class)?
- Why does precision go down when recall increases?
- Does the F1 measure weigh precision and recall equally?
- What’s the appeal of using Detectron2? Do we need to write a Pytorch model to
use it for inference or training?
Google Colab Notebooks
● Unet in FastAI 2
● Mask R-CNN and Panoptic Segmentation with Detectron 2
- How does panoptic segmentation combine instance and semantic
segmentation? Which method produces the ‘stuff’? Which method produces
the ‘things’?
- Is semantic segmentation more computationally costly than instance
segmentation? Why?
- Is panoptic segmentation more computationally costly than instance
segmentation? Why?
Review questions
References
● Stanford’s cs231n lecture on Object Detection and Segmentation
● PyImageSearch tutorial on Mask R-CNN

Image segmentation with deep learning