Skip to content

Commit 8352a31

Browse files
author
Yoshua Bengio
committed
English & math corrections
1 parent d76e9b4 commit 8352a31

File tree

1 file changed

+12
-8
lines changed

1 file changed

+12
-8
lines changed

doc/gettingstarted.txt

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,10 @@ MNIST Dataset
2727

2828
The `MNIST <http://yann.lecun.com/exdb/mnist>`_ dataset consists of handwritten
2929
digit images and it is divided in 60 000 examples for the training set and
30-
10 000 examples for testing. All examples have been size-normalized and
30+
10 000 examples for testing. In many papers as well as in this tutorial, the
31+
official training set of 60 000 is divided into an actual training set of 50 000
32+
examples and 10 000 validation examples (for selecting hyper-parameters like
33+
learning rate and size of the model). All digit images have been size-normalized and
3134
centered in a fixed size image of 28 x 28 pixels. In the original dataset
3235
each pixel of the image is represented by a value between 0 and 255, where
3336
0 is black, 255 is white and anything in between is a different shade of grey.
@@ -150,7 +153,7 @@ List of Symbols and acronyms
150153

151154
* :math:`D`: number of input dimensions.
152155
* :math:`D_h^{(i)}`: number of hidden units in the :math:`i`-th layer.
153-
* :math:`f_{\theta}(x)`, :math:`f(x)`: prediction function of a model :math:`P(Y|x,\theta)`, defined as :math:`argmax_k P(Y=k|x,\theta)`.
156+
* :math:`f_{\theta}(x)`, :math:`f(x)`: classification function associated with a model :math:`P(Y|x,\theta)`, defined as :math:`argmax_k P(Y=k|x,\theta)`.
154157
Note that we will often drop the :math:`\theta` subscript.
155158
* L: number of labels.
156159
* :math:`\mathcal{L}(\theta, \cal{D})`: log-likelihood :math:`\cal{D}`
@@ -270,7 +273,7 @@ as:
270273

271274
The NLL of our classifier is a differentiable surrogate for the zero-one loss,
272275
and we use the gradient of this function over our training data as a
273-
supervised learning signal for deep learning.
276+
supervised learning signal for deep learning of a classifier.
274277

275278
This can be computed using the following line of code :
276279

@@ -297,7 +300,7 @@ What is ordinary gradient descent? it is a simple
297300
algorithm in which we repeatedly make small steps downward on an error
298301
surface defined by a loss function of some parameters.
299302
For the purpose of ordinary gradient descent we consider that the training
300-
data is rolled into the loss function. Then the pseducode of this
303+
data is rolled into the loss function. Then the pseudocode of this
301304
algorithm can be described as :
302305

303306
.. code-block:: python
@@ -355,7 +358,7 @@ estimator, that time would be better spent on additional gradient steps.
355358
An optimal :math:`B` is model-, dataset-, and hardware-dependent, and can be
356359
anywhere from 1 to maybe several hundreds. In the tutorial we set it to 20,
357360
but this choice is almost arbitrary (though harmless). All code-blocks
358-
above show psuedocode of how the algorithm looks like. Implementing such
361+
above show pseudocode of how the algorithm looks like. Implementing such
359362
algorithm in Theano can be done as follows :
360363

361364
.. code-block:: python
@@ -417,9 +420,9 @@ or, in our case
417420

418421
.. math::
419422

420-
E(\theta, \mathcal{D}) = NLL(\theta, \mathcal{D}) + \lambda||\theta||_p
423+
E(\theta, \mathcal{D}) = NLL(\theta, \mathcal{D}) + \lambda||\theta||_p^p
421424

422-
with
425+
where
423426

424427
.. math::
425428

@@ -444,7 +447,8 @@ data.
444447

445448
Note that the fact that a solution is "simple" does not mean that it will
446449
generalize well. Empirically, it was found that performing such regularization
447-
in the context of neural networks helps with generalization.
450+
in the context of neural networks helps with generalization, especially
451+
on small datasets.
448452
The code block below shows how to compute the loss in python when it
449453
contains both a L1 regularization term weighted by :math:`\lambda_1` and
450454
L2 regularization term weighted by :math:`\lambda_2`

0 commit comments

Comments
 (0)