Image segmentation with deep learning

Image Segmentation with
Deep Learning
Antonio Rueda-Toicen and Imran Kocabiyik
Berlin Computer Vision Group
December 2020

https://www.meetup.com/Berlin-Computer-Vision-Group/

Agenda
● Image segmentation
■ Semantic segmentation
● Fully convolutional networks, U-net
■ Instance segmentation
● Mask R-CNN
■ Panoptic segmentation
● Feature Pyramid Networks
○ Public datasets
■ COCO
■ Google Open Images
○ Implementations: Detectron2, Fast.ai

Classification, detection, and segmentation
Classification refers to image-wide labels
Detection refers to localization of bounding boxes with labels
Segmentation refers to pixel-wise localization of the labels

Goals of supervised image segmentation
Given an input image we wish to obtain:
1. A class label associated to each individual pixel in the image. This is also called pixel-wise
localization.
3. The probability score associated with each class label

Applications of image segmentation
Link

http://withoutbg.com/

https://www.segmentive.ai/

Segmentation as pixel-wise localization

Instance segmentation requires object detection

Panoptic segmentation
https://arxiv.org/pdf/1801.00868.pdf
Explore it in the detectron2 inference notebook

“Fully Convolutional” networks draw segmentation
masks
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we use the local info of the pixel neighborhood

What is a convolution filter?
https://setosa.io/ev/image-kernels/

Convolution of 3x3 and stride = 1 without padding
Effect: the output loses one pixel on each dimension

Convolution of 3x3 and stride = 1 with zero padding
Effect: the output preserves original image size

Convolution of 3x3 and stride = 2 with zero padding
Effect: the output is downsampled to about half its size

U-net for semantic segmentation
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we need this fully convolutional architecture to label images pixel by pixel preserving their
local info

Image Pyramids in Feature Proposal Networks
(FPNs)

Convolutional networks implement “pyramids”
The deeper we go into the network, the more semantic value is compressed in lower x,y dimensions

Nearest neighbor interpolation
Resnets in feature pyramid networks

1x1 convolution
Resnets in feature pyramid networks

The COCO dataset
http://cocodataset.org/#explore

The Google Open Images Dataset

https://storage.googleapis.com/openimages/web/index.html

https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=segmentation&r=false&c=%2Fm%2F03g8mr

https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=detection&c=%2Fm%2F04rmv

https://cocodataset.org/#explore

Detectron2
detectron2/MODEL_ZOO.md at master · facebookresearch/detectron2 · GitHub
Inference (Colab notebook)
Training (Colab notebook)
Generating validation set plots

Panoptic segmentation with feature pyramid network (FPN-50)

https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

Model output format
https://detectron2.readthedocs.io/tutorials/model
s.html#model-output-format

Objective
Example Case: Image Matting

⊕
Using a Unet

Matting algorithm:

instance segment
Using trimap or instance segments?
⊕ or ⊕ ?

Results
Photo: Ayo Ogunseinde
https://unsplash.com/photos/THIs-cpyebg

Results
Photo: Eugen Proskouriakov
https://unsplash.com/photos/C-gvAA8q3Tc

Results
Photo: Mathieu Renier
https://unsplash.com/photos/4WBvCqeMaDE

Results
Photo: Gulyás Bianka
https://unsplash.com/photos/3WOh54znPGU

For more examples:
withoutbg.com

Which things should be kept in this picture?
Kid, ball, 2 dogs, 9 people?
Photo: Treddy Chen
https://unsplash.com/photos/UdQWvefOXJk

Issue: When there is more than one person in the image...

Review questions
- How do we compute the confusion matrix for a segmentation mask? How do we
compute it for a bounding box?
- Can we use the Intersection over Union equation to evaluate the quality of a
segmentation mask?
- What’s the recall of a classifier that only outputs ‘1’ (positive class)?
- What’s the precision of a classifier that outputs a single true positive, with all its
other predictions being equal to ‘0’ (negative class)?
- Why does precision go down when recall increases?
- Does the F1 measure weigh precision and recall equally?
- What’s the appeal of using Detectron2? Do we need to write a Pytorch model to
use it for inference or training?

Google Colab Notebooks
● Unet in FastAI 2
● Mask R-CNN and Panoptic Segmentation with Detectron 2

- How does panoptic segmentation combine instance and semantic
segmentation? Which method produces the ‘stuff’? Which method produces
the ‘things’?
- Is semantic segmentation more computationally costly than instance
segmentation? Why?
- Is panoptic segmentation more computationally costly than instance
segmentation? Why?
Review questions

References
● Stanford’s cs231n lecture on Object Detection and Segmentation
● PyImageSearch tutorial on Mask R-CNN

Image segmentation with deep learning

More Related Content

What's hot

Similar to Image segmentation with deep learning

Recently uploaded

Image segmentation with deep learning