Dmytro Fishman (dmytrofishman@gmail.com)
Intro to Deep Learning: 100%
Tartu: 100%
BioInformatics and
Information Technology
research group
https://biit.cs.ut.ee/
Scientific
Software
Statistical Analysis
Biological
Imaging
OMICS
Teaching &
Training
Personalised
Medicine
Data
management
Machine
Learning
Biological
Imaging
Biological
Imaging
1
2
3
4 5
6
7
Microscopy imaging
Biological
Imaging
1
2
3
4 5
6
7
Microscopy imaging
Retinopathy
detection
Classification of
skin cancer
Slides are @ https://bit.ly/2LgrKRd
Machine
Learning
Machine
Learning
Supervised
Learning
Machine
Learning
Machine
Learning
Supervised
Learning
Unsupervised
Learning
Machine
Learning
Reinforcement
Learning
Supervised
Learning
Unsupervised
Learning
Machine
Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Supervised
Learning
Machine
Learning
Unsupervised
Learning
Reinforcement
Learning
Machine
Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Machine
Learning
Deep
Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
What is deep learning?
What is deep learning?
An adaptive non-linear mapping from one space
to another
Space 1 Space 2
What is deep learning?
Space 1 Space 2
3
An adaptive non-linear mapping from one space
to another
What is deep learning?
Space 1 Space 2
SETOSA
An adaptive non-linear mapping from one space
to another
What is deep learning?
Space 1 Space 2
“Hello world!”
An adaptive non-linear mapping from one space
to another
In practice
DL = Artificial Neural Networks with many layers
4 pixel
image
4 pixel image
4 pixel
image
Categorise images
4 pixel
image
Categorise images Solid
4 pixel
image
Categorise images Solid
Vertical
4 pixel
image
Categorise images Solid
Vertical
Diagonal
4 pixel
image
Categorise images Solid
Vertical
Diagonal
Horizontal
4 pixel
image
Simple rules won’t do the trick Solid
Vertical
Diagonal
Horizontal
4 pixel
image
4 pixel
image
4 pixel
image
-0.5-1.0 0 1.00.5
Scale of values
4 pixel
image
-0.5
0
1.0
-1.0
4 pixel
image
-0.5
0
1.0
-1.0
Receptive field of neuron
4 pixel
image
-0.5
0
1.0
-1.0
Input layer
4 pixel
image
-0.5
0
1.0
-1.0
Input layer
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
Sum of inputs
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
1.0
1.0
1.0
1.0
Weights
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
1.0
1.0
1.0
1.0
= -0.5
-0.5 x 1.0
0 x 1.0
1.0 x 1.0
-1.0 x 1.0
Weights
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
= 0.7
-0.5 x 0.2
0 x -0.3
1.0 x 0.0
-1.0 x -0.8
Weights
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
= 0.7
-0.5 x 0.2
0 x -0.3
1.0 x 0.0
-1.0 x -0.8
Weights
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
0.7
Activation function
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
Sigmoid activation function
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
Sigmoid activation function
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
Sigmoid activation function
0.7
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
Sigmoid activation function
0.7
0.66
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
Sigmoid activation function
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
Sigmoid activation function
0.88
-8 -6 -4 -2 4 6 82
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
Sigmoid activation function
0.99
-8 -6 -4 -2 4 6 82
0.2
0.4
0.6
0.8
1.0
sig(x) =
1
1 + e−x
x
sig(x)
Sigmoid activation function
0.99
No matter how big your x can grow, your sig(x)
will always be between 0 and 1
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
0.7
Activation function
0.66
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
Weighted sum + activation function
= neuron
0.66
Activation
function0.7
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
Weighted sum + activation function
= neuron
input of the neuron
0.66
Activation
function0.7
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
Weighted sum + activation function
= neuron
input of the neuron
output of
the neuron
0.66
Activation
function0.7
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
Weighted sum + activation function
= neuron
input of the neuron
output of
the neuron
0.66
Activation
function0.7
4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
Weighted sum + activation function
= neuron
input of the neuron
output of
the neuron
0.66
Activation
function0.7
4 pixel
image
-0.5
0
1.0
-1.0
Input layer
0.2
-0.3
0.0
-0.8
Weighted sum + activation function
= neuron
0.66
Weighted sum usually is
not visualised explicitly
4 pixel
image
-0.5
0
1.0
-1.0
Input layer
1.0
-1.0
Let’s make more neurons,
connected by different weights
0.0(missing)
4 pixel
image
-0.5
0
1.0
-1.0
Input layer
4 pixel
image
-0.5
0
1.0
-1.0
Input layer More complex receptive fields
4 pixel
image
-0.5
0
1.0
-1.0
Input layer More complex receptive fields
4 pixel
image
-0.5
0
1.0
-1.0
Input layer More complex receptive fields
4 pixel
image
-0.5
0
1.0
-1.0
Input layer More complex receptive fields
4 pixel
image
-0.5
0
1.0
-1.0
Input layer
4 pixel
image
-0.5
0
1.0
-1.0
Input layer Receptive fields get even more complex
4 pixel
image
-0.5
0
1.0
-1.0
Input layer Adding one more layer
-8 -6 -4 -2 2 4 6 8
2
4
6
8
10
x
max(0, x)
Rectified Linear Units (ReLu)
-8 -6 -4 -2 2 4 6 8
2
4
6
8
10
R(x) = max(0, x)
x
max(0, x)
Rectified Linear Units (ReLu)
-8 -6 -4 -2 2 4 6 8
2
4
6
8
10
R(x) = max(0, x)
x
max(0, x)
Rectified Linear Units (ReLu)
If number is positive, keep it.
Otherwise it is zero.
4 pixel
image
-0.5
0
1.0
-1.0
Input layer Adding one more layer
4 pixel
image
-0.5
0
1.0
-1.0
Input layer Adding one more layer
4 pixel
image
-0.5
0
1.0
-1.0
Input layer Add output layer
Solid
Vertical
Diagonal
Horizontal
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
Sigmoid activation function
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.5
0.5
0.88
0.88
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.5
0.5
0.88
0.88
1.76
0.0
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.5
0.5 0.5
0.73
0.88
0.88
1.76
0.0
0.5
0.85
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.73
-0.73
0.5
0.5 0.5
0.5
-0.5
0.73
0.88
0.88
1.76
0.0
0.5
0.5
-0.5
0.85
0.85
-0.85
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.73
-0.73
0.5
0.5 0.5
0.5
-0.5
0.73
0.88
0.88
1.76
0.0
0.5
0.5
-0.5
0.85
0.85
-0.85
0.73
0.5
0.5
0.85
4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.73
-0.73
0.5
0.5 0.5
0.5
-0.5
0.73
0.88
0.88
1.76
0.0
0.5
0.5
-0.5
0.85
0.85
-0.85
0.73
0.5
0.5
0.85
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Consider this 2D example
(0.1,0.05)
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Consider this 2D example
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
(0.1,0.05)
Consider this 2D example
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
(0.1,0.05)
Consider this 2D example
x
y
0.0
0.00
(0.1,0.05)
Consider this 2D example
.05
0.1
Point
coordinates
as input
x
y
0
0.0
(0.1,0.05)
y
x
.05
0.1
Point
coordinates
as input
Hidden
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
x
y
0
0.0
(0.1,0.05)
y
x 0.25(w3)
.05
0.1
Point
coordinates
as input
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.40 (w5)
0.55 (w8)
0.45(w6)
x
y
0
0.0
(0.1,0.05)
y
x 0.25(w3)
0.50(w7)
.05
0.1
Point
coordinates
as input
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.40 (w5)
0.55 (w8)
0.45(w6)
?
?x
y
0
0.0
(0.1,0.05)
y
x
1.0
0.00.0
0.25(w3)
0.50(w7)
.05
0.1
Adding one more type of neurons
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.40 (w5)
0.55 (w8)
0.45(w6)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.25(w3)
0.50(w7)
.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.40 (w5)
0.55 (w8)
0.45(w6)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
What is the role of bias in Neural Networks?
0.25(w3)
0.50(w7)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
OutputInput
x
w1
sig(w1 * x)
What is the role of bias in Neural Networks?
Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
OutputInput
x
w1
sig(w1 * x)
What is the role of bias in Neural Networks?
w1 =[0.5, 1.0, 2.0]
Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
OutputInput
x
w1
sig(w1 * x)
What is the role of bias in Neural Networks?
w1 =[0.5, 1.0, 2.0]
OutputInput
x
w1
sig(w1 * x + b1)
b b1
Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
OutputInput
x
w1
sig(w1 * x)
Bias helps to shift the resulting curve
w1 =[0.5, 1.0, 2.0]
OutputInput
x
w1
sig(w1 * x + b1)
b1
= [-4.0, 0.0, 4.0]
b b1
w1=[1.0]
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x + b1)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
b1 b1
Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
OutputInput
x
w1
sig(w1 * x)
Bias helps to shift the resulting curve
w1 =[0.5, 1.0, 2.0]
OutputInput
x
w1
sig(w1 * x + b1)
b1
= [-4.0, 0.0, 4.0]
b b1
w1=[1.0]
sig(x) =
1
1 + e−x
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x + b1)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
b1 b1
Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
OutputInput
x
w1
sig(w1 * x)
Bias helps to shift the resulting curve
w1 =[0.5, 1.0, 2.0]
OutputInput
x
w1
sig(w1 * x + b1)
b1
= [-4.0, 0.0, 4.0]
b b1
w1=[1.0]
sig(x) =
1
1 + e−(wx+b)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x + b1)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
b1 b1
Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
Let’s calculate scores for each class
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = ?
Input of the first neuron in the
hidden layer?
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
Input of the first neuron in the
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = ?
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
Input of the first neuron in the
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
Input of the first neuron in the
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
What about output of this
neuron?
out(h1) = ?
out(h1) = sig(input(h1))
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
Input of the first neuron in the
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
What about output of this
neuron?
out(h1) = ?
out(h1) = sig(0.3775)
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
Input of the first neuron in the
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
What about output of this
neuron?
out(h1) = ?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
0.3775
out(h1) = sig(0.3775)
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
Input of the first neuron in the
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
What about output of this
neuron?
out(h1) = ?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
0.3775
out(h1) = sig(0.3775) = 0.5933
Feed-forward path
.05
0.1
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
Point
coordinates
as input
out(h1) = 0.5933 0.0
TruthAnswer Error
?
?
?
?1.0
input(h1) = 0.3775
Feed-forward path
.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
Input of the second neuron in
the hidden layer?
0.5933 out(h1)
input(h2) = ?
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
Input of the second neuron in
the hidden layer?
0.5933 out(h1)
input(h2) = ?
input(h2) = ? * ? + ? * ? + ?
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
Input of the second neuron in
the hidden layer?
0.5933 out(h1)
input(h2) = ?
input(h2) = x * w3 + y * w4 + b1
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
Input of the second neuron in
the hidden layer?
0.5933 out(h1)
input(h2) = 0.3925
input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925
input(h2) = x * w3 + y * w4 + b1
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
Input of the second neuron in
the hidden layer?
0.5933 out(h1)
input(h2) = 0.3925
input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925
input(h2) = x * w3 + y * w4 + b1
out(h2) = ?
out(h2) = sig(0.3925) = 0.5969
Output of the second neuron is
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
Input of the second neuron in
the hidden layer?
0.5933 out(h1)
input(h2) = 0.3925
input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925
input(h2) = x * w3 + y * w4 + b1
out(h2) = sig(0.3925) = 0.5969
Output of the second neuron is
out(h2) = 0.5969
Feed-forward path
.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = ?
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = ?
Input of the first neuron
in the output layer?
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = ?
Input of the first neuron
in the output layer?
input(o1) = out(h1) * w5 + out(h2) * w6 + b2
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = ?
Input of the first neuron
in the output layer?
input(o1) = out(h1) * w5 + out(h2) * w6 + b2
input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = 1.105
Input of the first neuron
in the output layer?
input(o1) = out(h1) * w5 + out(h2) * w6 + b2
input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = 1.105
Input of the first neuron
in the output layer?
input(o1) = out(h1) * w5 + out(h2) * w6 + b2
input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105
out(o1) = sig(input(o1))
Output
out(o1) = ?
Feed-forward path
0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = 1.105
Input of the first neuron
in the output layer?
input(o1) = out(h1) * w5 + out(h2) * w6 + b2
input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105
out(o1) = sig(1.105) = 0.7511
Output
out(o1) = 0.7511
Feed-forward path
.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
Feed-forward path
.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
Feed-forward path
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
Feed-forward path
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
How good are these predictions?
Feed-forward path
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
In order to find out how good
are these predictions, we
should compare them to the
expected outcomes
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
In order to find out how good
are these predictions, we
should compare them to the
expected outcomes
Eo1
=
1
2
(truth − out(o1))2
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
In order to find out how good
are these predictions, we
should compare them to the
expected outcomes
Eo1
=
1
2
(truth − out(o1))2
=
1
2
(0.751 − 0)2
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
In order to find out how good
are these predictions, we
should compare them to the
expected outcomes
Eo1
=
1
2
(truth − out(o1))2
=
1
2
(0.751 − 0)2
=
1
2
(−0.751)2
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
0.282
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
In order to find out how good
are these predictions, we
should compare them to the
expected outcomes
Eo1
=
1
2
(truth − out(o1))2
=
1
2
(0.751 − 0)2
= 0.282
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
In order to find out how good
are these predictions, we
should compare them to the
expected outcomes
0.282
Eo1
=
1
2
(truth − out(o1))2
=
1
2
(0.751 − 0)2
= 0.282
Eo2
=
1
2
(truth − out(o2))2
=
1
2
(0.773 − 1)2
= 0.025
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
0.025
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
In order to find out how good
are these predictions, we
should compare them to the
expected outcomes
0.282
Eo1
=
1
2
(truth − out(o1))2
=
1
2
(0.751 − 0)2
= 0.282
Eo2
=
1
2
(truth − out(o2))2
=
1
2
(0.773 − 1)2
= 0.025
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
How good are these predictions?
0.025
0.282
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
How good are these predictions?
0.025
0.282
Well, they are this good.
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Overall the error is: Etotal = Eo1
+ Eo2
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Overall the error is: Etotal = Eo1
+ Eo2
= 0.282 + 0.025 = 0.307
Etotal = 0.307
.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Our goal is to reduce the total error ( )!Etotal
Etotal = 0.307
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Our goal is to reduce the total error ( )!Etotal
The only thing we can do is change weights
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.45
0.50
0.35
0.30
We can change it randomly
(increase or decrease) and
see what happens to
How does w5 influence the total error?
0.55
0.25
Etotal
0.40
0.0.05
0.1
TruthAnswer Error
0.757
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.286
Etotal = 0.311
0.45
0.50
0.35
0.30
We can change it randomly
(increase or decrease) and
see what happens to
How does w5 influence the total error?
0.55
0.25
Etotal
0.40
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.45
0.50
0.35
0.30
We can change it randomly
(increase or decrease) and
see what happens to
How does w5 influence the total error?
0.55
0.25
Etotal
0.40
0.0.05
0.1
TruthAnswer Error
0.745
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.278
Etotal = 0.303
0.45
0.50
0.35
0.30
We can change it randomly
(increase or decrease) and
see what happens to
How does w5 influence the total error?
0.55
0.25
Etotal
0.40
0.0.05
0.1
TruthAnswer Error
0.740
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.273
Etotal = 0.299
0.45
0.50
0.35
0.30
We can change it randomly
(increase or decrease) and
see what happens to
How does w5 influence the total error?
0.55
0.25
Etotal
0.40
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.45
0.50
0.35
0.30
We can change it randomly
(increase or decrease) and
see what happens to
How does w5 influence the total error?
0.55
0.25
Etotal
0.40
w5
EtotalEtotal
Higher weightLower weight
0.740 0.273
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.45
0.50
0.35
0.30
We can change it randomly
(increase or decrease) and
see what happens to
How does w5 influence the total error?
0.55
0.25
Etotal
0.40
w5
EtotalEtotal
Higher weightLower weight
The problem is that every time we
need to recalculate all the neuron
values again.
0.740 0.273
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.45
0.50
0.35
0.30
0.55
0.25
0.40
0.740 0.273
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.45
0.50
0.35
0.30
0.55
0.25
0.40
0.740 0.273
0.6 (b2)0.60
0.55
0.35 (b1)0.35
0.30
0.20(w2)
0.250.200.15
0.25(w3)
0.50(w7)
0.30
0.25
0.20
0.55
0.50
0.45
0.45(w6)
0.500.450.40
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
0.15 (w1)
0.30 (w4)
(w5)
0.55 (w8)
b b
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.35
0.30
0.25
0.740 0.273
0.20
0.10
0.15
0.60
0.50
0.55
0.35
0.25
0.30
0.650.40
Also tweaking each weight separately
would be a nightmare
0.6 (b2)0.60
0.55
0.35 (b1)0.35
0.30
0.20(w2)
0.250.200.15
0.25(w3)
0.50(w7)
0.30
0.25
0.20
0.55
0.50
0.45
0.45(w6)
0.500.450.40
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
(w5)
0.55 (w8)
b b
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.35
0.30
0.25
0.740 0.273
0.20
0.10
0.15
0.60
0.50
0.55
0.35
0.25
0.30
0.650.40
Also tweaking each weight separately
would be a nightmare
weight
Etotal
weight + 1
We want a way to efficiently update all our weights so
that total error decreases most substantially
weight - 1
weight
Etotal
weight + 1
We want a way to efficiently update all our weights so
that total error decreases most substantially
We want to compute the gradient
weight - 1
weight
Etotal
weight + 1
We want to compute the gradient
Gradient shows the direction of the greatest increase of the
function
weight - 1
weight
Etotal
weight + 1
We want to compute the gradient
∂E
∂w
Gradient shows the direction of the greatest increase of the
function
weight - 1
weight
Etotal
weight + 1
We want to compute the gradient
−
∂E
∂w
weight - 1
∂E
∂w
Gradient shows the direction of the greatest increase of the
function
w5
Etotal
weight + 1
It looks OK when there is just one weight to take care of
We want to compute the gradient
−
∂E
∂w
∂E
∂w
Etotal
weight + 1
∂E
∂w−
∂E
∂w
w5
It starts to look a lot scarier when we try to optimise all
weights
w4
w3
w2
w1
w6 w7
w8
Etotal
weight + 1
∂E
∂w
w5
Gradient still can be computed efficiently using
backpropagation algorithm
It starts to look a lot scarier when we try to optimise all
weights
w4
w3
w2
w1
w6 w7
w8
−
∂E
∂w
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
∂Etotal
∂w5
= . . .
Backpropagation
algorithm
∂Etotal
∂w5
=
∂Etotal
∂Eo1
* . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
How does error on out(o1) influence the total error?
Backpropagation
algorithm
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
* . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
How does out(o1) influence the error on out(o1)?
Backpropagation
algorithm
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
* . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
How does input(o1) influence the out(o1)?
Backpropagation
algorithm
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
How does w5 influence the input(o1)?
Backpropagation
algorithm
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Etotal
∂Eo1
Etotal = Eo1
+ Eo2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Etotal
∂Eo1
∂Etotal
∂Eo1
= 1
Etotal = Eo1
+ Eo2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Etotal
∂Eo1
= 1
Etotal = Eo1
+ Eo2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Eo1
∂outo1
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Eo1
∂outo1
Eo1
=
1
2
(truth − outo1
)2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
∂Eo1
∂outo1
= . . .
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Eo1
∂outo1
Eo1
=
1
2
(truth − outo1
)2
∂Eo1
∂outo1
= 2 *
1
2
(truth − outo1
) * (−1) = . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Eo1
∂outo1
Eo1
=
1
2
(truth − outo1
)2
∂Eo1
∂outo1
= 2 *
1
2
(truth − outo1
) * (−1) = 0.751 − 0 = 0.751
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 * 0.751 *
∂outo1
∂ino1
*
∂ino1
∂w5
∂outo1
∂ino1
∂outo1
∂ino1
=
sig(x) =
1
1 + e−x
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂outo1
∂ino1
= outo1
* (1 − outo1
) = 0.751 * (1 − 0.751) = 0.186
∂Etotal
∂w5
= 1 * 0.751 *
∂outo1
∂ino1
*
∂ino1
∂w5
∂outo1
∂ino1
sig(x) =
1
1 + e−x
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 * 0.751 * 0.186 *
∂ino1
∂w5
∂ino1
∂w5
∂ino1
∂w5
=
in01
= . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 * 0.751 * 0.186 *
∂ino1
∂w5
∂ino1
∂w5
∂ino1
∂w5
=
in01
= outh1
* w5 + outh2
* w6 + b2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂ino1
∂w5
= outh1
= . . .
∂Etotal
∂w5
= 1 * 0.751 * 0.186 *
∂ino1
∂w5
∂ino1
∂w5
in01
= outh1
* w5 + outh2
* w6 + b2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂ino1
∂w5
= outh1
= 0.5933
∂Etotal
∂w5
= 1 * 0.751 * 0.186 *
∂ino1
∂w5
∂ino1
∂w5
in01
= outh1
* w5 + outh2
* w6 + b2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
0.5933 out(h1)
Backpropagation
algorithm
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
If value of the gradient is positive, then if we increase the
w5, the total error will …?
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
If value of the gradient is positive, then if we increase the
w5, the total error will increase!
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
w5new
= w5old
−
∂Etotal
∂w5
So we update w5:
If value of the gradient is positive, then if we increase the
w5, the total error will increase!
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
w5new
= w5old
− 0.082
So we update w5:
If value of the gradient is positive, then if we increase the
w5, the total error will increase!
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
So we update w5:
If update by a lot,
we can miss to optimum
w5
Etotal
If value of the gradient is positive, then if we increase the
w5, the total error will increase!
w5new
= w5old
− 0.082
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
So we update w5:
Thus, we make a bit smaller step
w5
Etotal
step size
If value of the gradient is positive, then if we increase the
w5, the total error will increase!
w5new
= w5old
− η * 0.082
w5new
= 0.4 − 0.5 * 0.083
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
So we update w5:
Thus, we make a bit smaller step
w5
Etotal
If value of the gradient is positive, then if we increase the
w5, the total error will increase!
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
So we update w5:
Thus, we make a bit smaller step
w5
Etotal
If value of the gradient is positive, then if we increase the
w5, the total error will increase!
w5new
= 0.4 − 0.5 * 0.083 = 0.3585
0.359 (w5)
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
out(o1)
0.7732 out(o2)
0.025
Etotal = . . .
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
So we update w5:
Thus, we make a bit smaller step
w5
Etotal
w5new
= 0.4 − 0.5 * 0.083 = 0.3585
0.0.05
0.1
TruthAnswer Error
0.611
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.6112 out(o1)
0.7732 out(o2)
0.025
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
So we update w5:
Thus, we make a bit smaller step
w5
Etotal
Etotal = . . .
0.359 (w5)
w5new
= 0.4 − 0.5 * 0.083 = 0.3585
0.0.05
0.1
TruthAnswer Error
0.611
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2) 0.7732 out(o2)
0.025
0.187
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
So we update w5:
Thus, we make a bit smaller step
w5
Etotal
Etotal = . . .
0.6112 out(o1)
0.359 (w5)
w5new
= 0.4 − 0.5 * 0.083 = 0.3585
Etotal = 0.303
0.2780.745 0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2) 0.7732 out(o2)
0.025
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1)
So we update w5:
Thus, we make a bit smaller step
w5
Etotal
0.6112 out(o1)
0.359 (w5)
w5new
= 0.4 − 0.5 * 0.083 = 0.3585
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2) 0.7732 out(o2)
0.025
How does w5 influence the total error?
Backpropagation
algorithm 0.5933 out(h1) 0.6112 out(o1)
After updating all remaining weights total error = 0.131
Repeating 1000 times decreases it to 0.001
Etotal = 0.303
0.2780.745
0.359 (w5)
Why we should understand backpropagation?
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Why we should understand backpropagation?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
Vanishing gradients on sigmoids
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Why we should understand backpropagation?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Vanishing gradients on sigmoids
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Why we should understand backpropagation?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Vanishing gradients on sigmoids
Derivative is zero at tails
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Why we should understand backpropagation?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Vanishing gradients on sigmoids
Derivative is zero at tails
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Recall the chain rule:
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
Why we should understand backpropagation?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Vanishing gradients on sigmoids
Derivative is zero at tails
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Recall the chain rule:
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
So, if x is too high (e.g. >8) or too low (e.g. < -8) then
∂outo1
∂ino1
= 0 and thus
∂Etotal
∂w5
= 0
Why we should understand backpropagation?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Vanishing gradients on sigmoids
2 4 6 8-8 -6 -4 -2
2
4
6
8
10
x
Dying ReLus
max(0, x)
Derivative is zero at tails
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Why we should understand backpropagation?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Vanishing gradients on sigmoids
2 4 6 8-8 -6 -4 -2
2
4
6
8
10
x
Dying ReLus
max(0, x)
2 4 6 8-8 -6 -4 -2
0.2
0.4
0.6
0.8
1.0
x
max′(0, x)
Derivative is zero at tails
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Why we should understand backpropagation?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Vanishing gradients on sigmoids
2 4 6 8-8 -6 -4 -2
2
4
6
8
10
x
Dying ReLus
max(0, x)
2 4 6 8-8 -6 -4 -2
0.2
0.4
0.6
0.8
1.0
x
max′(0, x)
Derivative is zero at tails
Derivative is zero
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
x
y
0.0
0.00
Now let’s practice
(0.1,0.05)
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Now let’s practice
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Now let’s practice
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Now let’s practice
https://goo.gl/cX2dGd
http://www.emergentmind.com/neural-network
Training Neural Networks
(part I)
http://playground.tensorflow.org/
Training Neural Networks
(part II)
Resources:
Brandon Rohrer’s youtube video: How Convolutional Neural Networks
work (https://youtu.be/FmpDIaiMIeA)
Brandon Rohrer’s youtube video: How Deep Neural Networks Work
(https://youtu.be/ILsA4nyG7I0)
Stanford University CS231n Convolutional Neural Networks for Visual
Recognition (github.io version): http://cs231n.github.io/
Matt Mazur’s: A Step by Step Backpropagation Example (https://
mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)
Andrej Karphaty’s blog post: Yes you should understand backprop
(https://medium.com/@karpathy/yes-you-should-understand-backprop-
e2f06eab496b)
Raul Vicente’s lecture: From brain to Deep Learning and back (https://
www.uttv.ee/naita?id=23585)
Mayank Agarwal’s blog: Back Propagation in Convolutional Neural
Networks — Intuition and Code (https://becominghuman.ai/back-
propagation-in-convolutional-neural-networks-intuition-and-
code-714ef1c38199)
Mordor riddle
http://fouryears.eu/2009/11/21/machine-learning-in-mordor/
https://biit.cs.ut.ee/
Thank you!
Introduction to Deep Learning

Introduction to Deep Learning