Introduction to Deep Learning

Dmytro Fishman (dmytroﬁshman@gmail.com)
Intro to Deep Learning: 100%
Tartu: 100%

BioInformatics and
Information Technology
research group
https://biit.cs.ut.ee/

Scientiﬁc
Software
Statistical Analysis
Biological
Imaging
OMICS
Teaching &
Training
Personalised
Medicine
Data
management
Machine
Learning

Biological
Imaging
1
2
3
4 5
6
7
Microscopy imaging

Biological
Imaging
1
2
3
4 5
6
7
Microscopy imaging
Retinopathy
detection
Classiﬁcation of
skin cancer

Slides are @ https://bit.ly/2LgrKRd

Supervised
Learning
Machine
Learning

Machine
Learning
Supervised
Learning
Unsupervised
Learning

Machine
Learning
Reinforcement
Learning
Supervised
Learning
Unsupervised
Learning

Machine
Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning

Supervised
Learning
Machine
Learning
Unsupervised
Learning
Reinforcement
Learning

Machine
Learning
Deep
Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning

What is deep learning?
An adaptive non-linear mapping from one space
to another
Space 1 Space 2

Space 1 Space 2
3
to another

Space 1 Space 2
SETOSA
to another

Space 1 Space 2
“Hello world!”
to another

In practice
DL = Artiﬁcial Neural Networks with many layers

4 pixel
image
Categorise images

4 pixel
image
Categorise images Solid

4 pixel
image
Vertical

4 pixel
image
Vertical
Diagonal

4 pixel
image
Vertical
Diagonal
Horizontal

4 pixel
image
Simple rules won’t do the trick Solid
Vertical
Diagonal
Horizontal

4 pixel
image
-0.5-1.0 0 1.00.5
Scale of values

4 pixel
image
-0.5
0
1.0
-1.0
Receptive ﬁeld of neuron

4 pixel
image
-0.5
0
1.0
-1.0
Input layer

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
Sum of inputs

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
1.0
1.0
1.0
1.0
Weights

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
1.0
1.0
1.0
1.0
= -0.5
-0.5 x 1.0
0 x 1.0
1.0 x 1.0
-1.0 x 1.0
Weights

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
= 0.7
-0.5 x 0.2
0 x -0.3
1.0 x 0.0
-1.0 x -0.8
Weights

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
0.7
Activation function

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
Sigmoid activation function

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
0.7

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
0.7
0.66

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
0.88

-8 -6 -4 -2 4 6 82
0.2
0.4
0.6
0.8
1.0
x
sig(x)
sig(x) =
1
1 + e−x
0.99

-8 -6 -4 -2 4 6 82
0.2
0.4
0.6
0.8
1.0
sig(x) =
1
1 + e−x
x
sig(x)
0.99
No matter how big your x can grow, your sig(x)
will always be between 0 and 1

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
0.7
Activation function
0.66

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
Weighted sum + activation function
= neuron
0.66
Activation
function0.7

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
= neuron
input of the neuron
0.66
Activation
function0.7

4 pixel
image
-0.5
0
1.0
-1.0
+
Input layer
0.2
-0.3
0.0
-0.8
= neuron
input of the neuron
output of
the neuron
0.66
Activation
function0.7

4 pixel
image
-0.5
0
1.0
-1.0
Input layer
0.2
-0.3
0.0
-0.8
= neuron
0.66
Weighted sum usually is
not visualised explicitly

4 pixel
image
-0.5
0
1.0
-1.0
Input layer
1.0
-1.0
Let’s make more neurons,
connected by different weights
0.0(missing)

4 pixel
image
-0.5
0
1.0
-1.0
Input layer More complex receptive ﬁelds

4 pixel
image
-0.5
0
1.0
-1.0
Input layer Receptive ﬁelds get even more complex

4 pixel
image
-0.5
0
1.0
-1.0
Input layer Adding one more layer

-8 -6 -4 -2 2 4 6 8
2
4
6
8
10
x
max(0, x)
Rectiﬁed Linear Units (ReLu)

-8 -6 -4 -2 2 4 6 8
2
4
6
8
10
R(x) = max(0, x)
x
max(0, x)

-8 -6 -4 -2 2 4 6 8
2
4
6
8
10
R(x) = max(0, x)
x
max(0, x)
If number is positive, keep it.
Otherwise it is zero.

4 pixel
image
-0.5
0
1.0
-1.0
Input layer Add output layer
Solid
Vertical
Diagonal
Horizontal

4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal

4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0

4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.5
0.5
0.88
0.88

4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.5
0.5
0.88
0.88
1.76
0.0

4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.5
0.5 0.5
0.73
0.88
0.88
1.76
0.0
0.5
0.85

4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.73
-0.73
0.5
0.5 0.5
0.5
-0.5
0.73
0.88
0.88
1.76
0.0
0.5
0.5
-0.5
0.85
0.85
-0.85

4 pixel
image
-1.0
-1.0
1.0
1.0
Input layer Example
Solid
Vertical
Diagonal
Horizontal
0.0
0.0
2.0
2.0
0.0
1.0
0.73
-0.73
0.5
0.5 0.5
0.5
-0.5
0.73
0.88
0.88
1.76
0.0
0.5
0.5
-0.5
0.85
0.85
-0.85
0.73
0.5
0.5
0.85

x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Consider this 2D example

(0.1,0.05)
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00

x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
(0.1,0.05)

x
y
0.0
0.00
(0.1,0.05)

.05
0.1
Point
coordinates
as input
x
y
0
0.0
(0.1,0.05)
y
x

.05
0.1
Point
coordinates
as input
Hidden
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
x
y
0
0.0
(0.1,0.05)
y
x 0.25(w3)

.05
0.1
Point
coordinates
as input
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.40 (w5)
0.55 (w8)
0.45(w6)
x
y
0
0.0
(0.1,0.05)
y
x 0.25(w3)
0.50(w7)

.05
0.1
Point
coordinates
as input
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.40 (w5)
0.55 (w8)
0.45(w6)
?
?x
y
0
0.0
(0.1,0.05)
y
x
1.0
0.00.0
0.25(w3)
0.50(w7)

.05
0.1
Adding one more type of neurons
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.40 (w5)
0.55 (w8)
0.45(w6)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.25(w3)
0.50(w7)

.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.40 (w5)
0.55 (w8)
0.45(w6)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
What is the role of bias in Neural Networks?
0.25(w3)
0.50(w7)

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
OutputInput
x
w1
sig(w1 * x)
Source: http://www.uta.ﬁ/sis/tie/neuro/index/Neurocomputing2.pdf

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
OutputInput
x
w1
sig(w1 * x)
w1 =[0.5, 1.0, 2.0]

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
OutputInput
x
w1
sig(w1 * x)
w1 =[0.5, 1.0, 2.0]
OutputInput
x
w1
sig(w1 * x + b1)
b b1

OutputInput
x
w1
sig(w1 * x)
Bias helps to shift the resulting curve
w1 =[0.5, 1.0, 2.0]
OutputInput
x
w1
sig(w1 * x + b1)
b1
= [-4.0, 0.0, 4.0]
b b1
w1=[1.0]
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x + b1)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
b1 b1

OutputInput
x
w1
sig(w1 * x)
w1 =[0.5, 1.0, 2.0]
OutputInput
x
w1
sig(w1 * x + b1)
b1
= [-4.0, 0.0, 4.0]
b b1
w1=[1.0]
sig(x) =
1
1 + e−x
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x + b1)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
b1 b1

OutputInput
x
w1
sig(w1 * x)
w1 =[0.5, 1.0, 2.0]
OutputInput
x
w1
sig(w1 * x + b1)
b1
= [-4.0, 0.0, 4.0]
b b1
w1=[1.0]
sig(x) =
1
1 + e−(wx+b)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x + b1)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(w1 * x)
b1 b1

.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
Let’s calculate scores for each class
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = ?
Input of the ﬁrst neuron in the
hidden layer?
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = ?
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
What about output of this
neuron?
out(h1) = ?
out(h1) = sig(input(h1))
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
neuron?
out(h1) = ?
out(h1) = sig(0.3775)
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
neuron?
out(h1) = ?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
0.3775
out(h1) = sig(0.3775)
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
input(h1) = 0.3775
hidden layer?
input(h1) = x * w1 + y * w2 + b1
input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775
neuron?
out(h1) = ?
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
0.3775
out(h1) = sig(0.3775) = 0.5933
Feed-forward path

.05
0.1
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
Point
coordinates
as input
out(h1) = 0.5933 0.0
TruthAnswer Error
?
?
?
?1.0
input(h1) = 0.3775
Feed-forward path

.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
Input of the second neuron in
the hidden layer?
0.5933 out(h1)
input(h2) = ?
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
the hidden layer?
0.5933 out(h1)
input(h2) = ?
input(h2) = ? * ? + ? * ? + ?
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
the hidden layer?
0.5933 out(h1)
input(h2) = ?
input(h2) = x * w3 + y * w4 + b1
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
the hidden layer?
0.5933 out(h1)
input(h2) = 0.3925
input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925
input(h2) = x * w3 + y * w4 + b1
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
the hidden layer?
0.5933 out(h1)
input(h2) = 0.3925
input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925
input(h2) = x * w3 + y * w4 + b1
out(h2) = ?
out(h2) = sig(0.3925) = 0.5969
Output of the second neuron is
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
the hidden layer?
0.5933 out(h1)
input(h2) = 0.3925
input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925
input(h2) = x * w3 + y * w4 + b1
out(h2) = sig(0.3925) = 0.5969
Output of the second neuron is
out(h2) = 0.5969
Feed-forward path

.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = ?
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = ?
Input of the ﬁrst neuron
in the output layer?
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = ?
input(o1) = out(h1) * w5 + out(h2) * w6 + b2
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = ?
input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = 1.105
input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = 1.105
input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105
out(o1) = sig(input(o1))
Output
out(o1) = ?
Feed-forward path

0.0.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
input(o1) = 1.105
input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105
out(o1) = sig(1.105) = 0.7511
Output
out(o1) = 0.7511
Feed-forward path

.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
Feed-forward path

.05
0.1
TruthAnswer Error
?
?
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
Feed-forward path

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
Feed-forward path

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
How good are these predictions?
Feed-forward path

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
In order to ﬁnd out how good
are these predictions, we
should compare them to the
expected outcomes

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
expected outcomes
Eo1
=
1
2
(truth − out(o1))2

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
expected outcomes
Eo1
=
1
2
=
1
2
(0.751 − 0)2

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
expected outcomes
Eo1
=
1
2
=
1
2
(0.751 − 0)2
=
1
2
(−0.751)2

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
0.282
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
expected outcomes
Eo1
=
1
2
=
1
2
(0.751 − 0)2
= 0.282

0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
?
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
expected outcomes
0.282
Eo1
=
1
2
=
1
2
(0.751 − 0)2
= 0.282
Eo2
=
1
2
=
1
2
(0.773 − 1)2
= 0.025

0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
0.025
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
expected outcomes
0.282
Eo1
=
1
2
=
1
2
(0.751 − 0)2
= 0.282
Eo2
=
1
2
=
1
2
(0.773 − 1)2
= 0.025

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Well, they are this good.

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Overall the error is: Etotal = Eo1
+ Eo2

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Overall the error is: Etotal = Eo1
+ Eo2
= 0.282 + 0.025 = 0.307
Etotal = 0.307

.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
0.00.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Our goal is to reduce the total error ( )!Etotal
Etotal = 0.307

0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.40 (w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Our goal is to reduce the total error ( )!Etotal
The only thing we can do is change weights

0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 inﬂuence the total error?

0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.45
0.50
0.35
0.30
We can change it randomly
(increase or decrease) and
see what happens to
0.55
0.25
Etotal
0.40

0.0.05
0.1
TruthAnswer Error
0.757
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.286
Etotal = 0.311
0.45
0.50
0.35
0.30
see what happens to
0.55
0.25
Etotal
0.40

0.0.05
0.1
TruthAnswer Error
0.745
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.278
Etotal = 0.303
0.45
0.50
0.35
0.30
see what happens to
0.55
0.25
Etotal
0.40

0.0.05
0.1
TruthAnswer Error
0.740
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.273
Etotal = 0.299
0.45
0.50
0.35
0.30
see what happens to
0.55
0.25
Etotal
0.40

0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.45
0.50
0.35
0.30
see what happens to
0.55
0.25
Etotal
0.40
w5
EtotalEtotal
Higher weightLower weight
0.740 0.273

0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.45
0.50
0.35
0.30
see what happens to
0.55
0.25
Etotal
0.40
w5
EtotalEtotal
Higher weightLower weight
The problem is that every time we
need to recalculate all the neuron
values again.
0.740 0.273

0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
(w5)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.45
0.50
0.35
0.30
0.55
0.25
0.40
0.740 0.273

0.6 (b2)0.60
0.55
0.35 (b1)0.35
0.30
0.20(w2)
0.250.200.15
0.25(w3)
0.50(w7)
0.30
0.25
0.20
0.55
0.50
0.45
0.45(w6)
0.500.450.40
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
0.15 (w1)
0.30 (w4)
(w5)
0.55 (w8)
b b
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.35
0.30
0.25
0.740 0.273
0.20
0.10
0.15
0.60
0.50
0.55
0.35
0.25
0.30
0.650.40
Also tweaking each weight separately
would be a nightmare

0.6 (b2)0.60
0.55
0.35 (b1)0.35
0.30
0.20(w2)
0.250.200.15
0.25(w3)
0.50(w7)
0.30
0.25
0.20
0.55
0.50
0.45
0.45(w6)
0.500.450.40
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
(w5)
0.55 (w8)
b b
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
Etotal = 0.299
0.35
0.30
0.25
0.740 0.273
0.20
0.10
0.15
0.60
0.50
0.55
0.35
0.25
0.30
0.650.40
Also tweaking each weight separately
would be a nightmare

weight
Etotal
weight + 1
We want a way to efﬁciently update all our weights so
that total error decreases most substantially
weight - 1

weight
Etotal
weight + 1
We want a way to efﬁciently update all our weights so
that total error decreases most substantially
We want to compute the gradient
weight - 1

weight
Etotal
weight + 1
Gradient shows the direction of the greatest increase of the
function
weight - 1

weight
Etotal
weight + 1
∂E
∂w
function
weight - 1

weight
Etotal
weight + 1
−
∂E
∂w
weight - 1
∂E
∂w
function

w5
Etotal
weight + 1
It looks OK when there is just one weight to take care of
−
∂E
∂w
∂E
∂w

Etotal
weight + 1
∂E
∂w−
∂E
∂w
w5
It starts to look a lot scarier when we try to optimise all
weights
w4
w3
w2
w1
w6 w7
w8

Etotal
weight + 1
∂E
∂w
w5
Gradient still can be computed efﬁciently using
backpropagation algorithm
It starts to look a lot scarier when we try to optimise all
weights
w4
w3
w2
w1
w6 w7
w8
−
∂E
∂w

0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
algorithm

0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
∂Etotal
∂w5
= . . .
Backpropagation
algorithm

∂Etotal
∂w5
=
∂Etotal
∂Eo1
* . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does error on out(o1) inﬂuence the total error?
Backpropagation
algorithm

∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
* . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does out(o1) inﬂuence the error on out(o1)?
Backpropagation
algorithm

∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
* . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does input(o1) inﬂuence the out(o1)?
Backpropagation
algorithm

∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
How does w5 inﬂuence the input(o1)?
Backpropagation
algorithm

∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Etotal
∂Eo1
Etotal = Eo1
+ Eo2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Etotal
∂Eo1
∂Etotal
∂Eo1
= 1
Etotal = Eo1
+ Eo2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Etotal
∂Eo1
= 1
Etotal = Eo1
+ Eo2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Eo1
∂outo1
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Eo1
∂outo1
Eo1
=
1
2
(truth − outo1
)2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
∂Eo1
∂outo1
= . . .
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Eo1
∂outo1
Eo1
=
1
2
(truth − outo1
)2
∂Eo1
∂outo1
= 2 *
1
2
(truth − outo1
) * (−1) = . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 *
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
∂Eo1
∂outo1
Eo1
=
1
2
(truth − outo1
)2
∂Eo1
∂outo1
= 2 *
1
2
(truth − outo1
) * (−1) = 0.751 − 0 = 0.751
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 * 0.751 *
∂outo1
∂ino1
*
∂ino1
∂w5
∂outo1
∂ino1
∂outo1
∂ino1
=
sig(x) =
1
1 + e−x
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂outo1
∂ino1
= outo1
* (1 − outo1
) = 0.751 * (1 − 0.751) = 0.186
∂Etotal
∂w5
= 1 * 0.751 *
∂outo1
∂ino1
*
∂ino1
∂w5
∂outo1
∂ino1
sig(x) =
1
1 + e−x
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 * 0.751 * 0.186 *
∂ino1
∂w5
∂ino1
∂w5
∂ino1
∂w5
=
in01
= . . .
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 * 0.751 * 0.186 *
∂ino1
∂w5
∂ino1
∂w5
∂ino1
∂w5
=
in01
= outh1
* w5 + outh2
* w6 + b2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂ino1
∂w5
= outh1
= . . .
∂Etotal
∂w5
= 1 * 0.751 * 0.186 *
∂ino1
∂w5
∂ino1
∂w5
in01
= outh1
* w5 + outh2
* w6 + b2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂ino1
∂w5
= outh1
= 0.5933
∂Etotal
∂w5
= 1 * 0.751 * 0.186 *
∂ino1
∂w5
∂ino1
∂w5
in01
= outh1
* w5 + outh2
* w6 + b2
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
0.5933 out(h1)
Backpropagation
algorithm

∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
algorithm 0.5933 out(h1)
If value of the gradient is positive, then if we increase the
w5, the total error will …?

∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
w5, the total error will increase!

∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
w5new
= w5old
−
∂Etotal
∂w5
So we update w5:

∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
w5new
= w5old
− 0.082
So we update w5:

∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
So we update w5:
If update by a lot,
we can miss to optimum
w5
Etotal
w5new
= w5old
− 0.082

∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
So we update w5:
Thus, we make a bit smaller step
w5
Etotal
step size
w5new
= w5old
− η * 0.082

w5new
= 0.4 − 0.5 * 0.083
∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
So we update w5:
w5
Etotal

∂Etotal
∂w5
= 1 * 0.751 * 0.186 * 0.5933 = 0.082
0.40 (w5)
0.0.05
0.1
TruthAnswer Error
0.751
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.7511 out(o1)
0.7732 out(o2)
0.025
0.282
Etotal = 0.307
Backpropagation
So we update w5:
w5
Etotal
w5new
= 0.4 − 0.5 * 0.083 = 0.3585

0.359 (w5)
0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
out(o1)
0.7732 out(o2)
0.025
Etotal = . . .
Backpropagation
So we update w5:
w5
Etotal
w5new
= 0.4 − 0.5 * 0.083 = 0.3585

0.0.05
0.1
TruthAnswer Error
0.611
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2)
0.6112 out(o1)
0.7732 out(o2)
0.025
Backpropagation
So we update w5:
w5
Etotal
Etotal = . . .
0.359 (w5)
w5new
= 0.4 − 0.5 * 0.083 = 0.3585

0.0.05
0.1
TruthAnswer Error
0.611
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2) 0.7732 out(o2)
0.025
0.187
Backpropagation
So we update w5:
w5
Etotal
Etotal = . . .
0.6112 out(o1)
0.359 (w5)
w5new
= 0.4 − 0.5 * 0.083 = 0.3585

Etotal = 0.303
0.2780.745 0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2) 0.7732 out(o2)
0.025
Backpropagation
So we update w5:
w5
Etotal
0.6112 out(o1)
0.359 (w5)
w5new
= 0.4 − 0.5 * 0.083 = 0.3585

0.0.05
0.1
TruthAnswer Error
0.773
Hidden
layer
Output
layer
0.15 (w1)
0.30 (w4)
0.20(w2)
0.25(w3)
0.55 (w8)
0.45(w6)
0.50(w7)
b
0.35 (b1)
b
0.6 (b2)
x
y
0
0.0
(0.1,0.05)
1.0
Point
coordinates
as input
0.5933 out(h1)
0.5969 out(h2) 0.7732 out(o2)
0.025
Backpropagation
algorithm 0.5933 out(h1) 0.6112 out(o1)
After updating all remaining weights total error = 0.131
Repeating 1000 times decreases it to 0.001
Etotal = 0.303
0.2780.745
0.359 (w5)

Why we should understand backpropagation?
https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
Vanishing gradients on sigmoids

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Derivative is zero at tails

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Recall the chain rule:
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
Recall the chain rule:
∂Etotal
∂w5
=
∂Etotal
∂Eo1
*
∂Eo1
∂outo1
*
∂outo1
∂ino1
*
∂ino1
∂w5
So, if x is too high (e.g. >8) or too low (e.g. < -8) then
∂outo1
∂ino1
= 0 and thus
∂Etotal
∂w5
= 0

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
2 4 6 8-8 -6 -4 -2
2
4
6
8
10
x
Dying ReLus
max(0, x)

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
2 4 6 8-8 -6 -4 -2
2
4
6
8
10
x
Dying ReLus
max(0, x)
2 4 6 8-8 -6 -4 -2
0.2
0.4
0.6
0.8
1.0
x
max′(0, x)

-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig(x)
-8 -6 -4 -2 2 4 6 8
0.2
0.4
0.6
0.8
1.0
x
sig′(x)
2 4 6 8-8 -6 -4 -2
2
4
6
8
10
x
Dying ReLus
max(0, x)
2 4 6 8-8 -6 -4 -2
0.2
0.4
0.6
0.8
1.0
x
max′(0, x)
Derivative is zero

x
y
0.0
0.00
Now let’s practice
(0.1,0.05)

x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00

x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
https://goo.gl/cX2dGd

http://www.emergentmind.com/neural-network
Training Neural Networks
(part I)

http://playground.tensorﬂow.org/
Training Neural Networks
(part II)

Resources:
Brandon Rohrer’s youtube video: How Convolutional Neural Networks
work (https://youtu.be/FmpDIaiMIeA)
Brandon Rohrer’s youtube video: How Deep Neural Networks Work
(https://youtu.be/ILsA4nyG7I0)
Stanford University CS231n Convolutional Neural Networks for Visual
Recognition (github.io version): http://cs231n.github.io/
Matt Mazur’s: A Step by Step Backpropagation Example (https://
mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)
Andrej Karphaty’s blog post: Yes you should understand backprop
(https://medium.com/@karpathy/yes-you-should-understand-backprop-
e2f06eab496b)
Raul Vicente’s lecture: From brain to Deep Learning and back (https://
www.uttv.ee/naita?id=23585)
Mayank Agarwal’s blog: Back Propagation in Convolutional Neural
Networks — Intuition and Code (https://becominghuman.ai/back-
propagation-in-convolutional-neural-networks-intuition-and-
code-714ef1c38199)

Mordor riddle
http://fouryears.eu/2009/11/21/machine-learning-in-mordor/

https://biit.cs.ut.ee/
Thank you!

Introduction to Deep Learning

More Related Content

Similar to Introduction to Deep Learning

More from Dmytro Fishman

Recently uploaded

Introduction to Deep Learning