-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmodel.html
428 lines (418 loc) · 51.1 KB
/
model.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Defining a Simple Model using Expressions
</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="description" content="Deep.Net machine learning framework"/>
<meta name="author" content="Deep.Net developers"/>
<script src="https://code.jquery.com/jquery-1.8.0.js"></script>
<script src="https://code.jquery.com/ui/1.8.23/jquery-ui.js"></script>
<script src="https://netdna.bootstrapcdn.com/twitter-bootstrap/2.2.1/js/bootstrap.min.js"></script>
<link href="https://netdna.bootstrapcdn.com/twitter-bootstrap/2.2.1/css/bootstrap-combined.min.css" rel="stylesheet"/>
<link type="text/css" rel="stylesheet" href="http://www.deepml.net/content/style.css" />
<script type="text/javascript" src="http://www.deepml.net/content/tips.js"></script>
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" async
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
</head>
<body>
<div class="container">
<div class="masthead">
<ul class="nav nav-pills pull-right">
<li><a href="http://github.com/DeepMLNet/DeepNet">github page</a></li>
</ul>
<h3 class="muted"><a href="http://www.deepml.net/index.html">Deep.Net</a></h3>
</div>
<hr />
<div class="row">
<div class="span9" id="main">
<h1><a name="Defining-a-Simple-Model-using-Expressions" class="anchor" href="#Defining-a-Simple-Model-using-Expressions">Defining a Simple Model using Expressions</a></h1>
<p>In this example we will show how to learn <a href="http://yann.lecun.com/exdb/mnist/">MNIST handwritten digit classification</a> using a two-layer feed-forward network.
As an introductory example, the model will be defined using basic mathematical expressions only to explain their use.
Deep.Net also allows to build model from components (for example a multi-layer perceptron) which is usually more understandable and more code efficient.
This technique will be explained in the <a href="components.html">Model Components</a> chapter.</p>
<p>You can run this example by executing <code>FsiAnyCPU.exe docs\content\model.fsx</code> after cloning the Deep.Net repository.
You can move your mouse over any symbol in the code samples to see the full signature.
A <a href="https://fsharpforfunandprofit.com/posts/function-signatures/">quick introduction to F# signatures</a> might be helpful.</p>
<h3><a name="Namespaces" class="anchor" href="#Namespaces">Namespaces</a></h3>
<p>The <code>ArrayNDNS</code> namespace houses the <a href="tensor.html">numeric tensor functionality</a>.
<code>SymTensor</code> houses the symbolic expression library.
<code>SymTensor.Compiler.Cuda</code> provides compilation of symbolic expressions to functions that are executed on a CUDA GPU.
The <code>Datasets</code> namespace provides dataset loading and handling functions.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">open</span> <span class="i">ArrayNDNS</span>
<span class="k">open</span> <span class="i">SymTensor</span>
<span class="k">open</span> <span class="i">SymTensor</span><span class="o">.</span><span class="i">Compiler</span><span class="o">.</span><span class="i">Cuda</span>
<span class="k">open</span> <span class="i">Datasets</span>
</code></pre></td>
</tr>
</table>
<h2><a name="Loading-MNIST" class="anchor" href="#Loading-MNIST">Loading MNIST</a></h2>
<p>Deep.Net provides the <code>Dataset</code> type for simple and efficient handling of datasets.
In this chapter we skip over the usage details and refer to the <a href="dataset.html">Dataset Handling</a> chapter for further information.</p>
<p>We use the Mnist module from the Datasets library to load the <a href="http://yann.lecun.com/exdb/mnist/">MNIST handwritten digits dataset</a> consisting of a training set of 60 000 sample and a test set of 10 000 sample.
Each sample consists of <span class="math">\(28 \times 28\)</span> pixels and an associated integer class label between 0 and 9.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
<span class="l">5: </span>
<span class="l">6: </span>
<span class="l">7: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs1', 1)" onmouseover="showTip(event, 'fs1', 1)" class="i">mnist</span> <span class="o">=</span> <span class="i">Mnist</span><span class="o">.</span><span class="i">load</span> (<span class="k">__SOURCE_DIRECTORY__</span> <span class="o">+</span> <span class="s">"../../../Data/MNIST"</span>) <span class="n">0.0</span>
<span class="o">|></span> <span class="i">TrnValTst</span><span class="o">.</span><span class="i">ToCuda</span>
<span onmouseout="hideTip(event, 'fs2', 2)" onmouseover="showTip(event, 'fs2', 2)" class="f">printfn</span> <span class="s">"MNIST training set: images have shape </span><span class="pf">%A</span><span class="s"> and labels have shape </span><span class="pf">%A</span><span class="s">"</span>
<span onmouseout="hideTip(event, 'fs1', 3)" onmouseover="showTip(event, 'fs1', 3)" class="i">mnist</span><span class="o">.</span><span class="i">Trn</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Img</span><span class="o">.</span><span class="i">Shape</span> <span onmouseout="hideTip(event, 'fs1', 4)" onmouseover="showTip(event, 'fs1', 4)" class="i">mnist</span><span class="o">.</span><span class="i">Trn</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Lbl</span><span class="o">.</span><span class="i">Shape</span>
<span onmouseout="hideTip(event, 'fs2', 5)" onmouseover="showTip(event, 'fs2', 5)" class="f">printfn</span> <span class="s">"MNIST test set: images have shape </span><span class="pf">%A</span><span class="s"> and labels have shape </span><span class="pf">%A</span><span class="s">"</span>
<span onmouseout="hideTip(event, 'fs1', 6)" onmouseover="showTip(event, 'fs1', 6)" class="i">mnist</span><span class="o">.</span><span class="i">Tst</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Img</span><span class="o">.</span><span class="i">Shape</span> <span onmouseout="hideTip(event, 'fs1', 7)" onmouseover="showTip(event, 'fs1', 7)" class="i">mnist</span><span class="o">.</span><span class="i">Tst</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Lbl</span><span class="o">.</span><span class="i">Shape</span>
</code></pre></td>
</tr>
</table>
<p>This prints</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">MNIST</span> <span class="i">training</span> <span onmouseout="hideTip(event, 'fs3', 8)" onmouseover="showTip(event, 'fs3', 8)" class="i">set</span><span class="o">:</span> <span class="i">images</span> <span class="i">have</span> <span class="i">shape</span> [<span class="n">60000</span>; <span class="n">784</span>] <span class="k">and</span> <span class="i">labels</span> <span class="i">have</span> <span class="i">shape</span> [<span class="n">60000</span>; <span class="n">10</span>]
<span class="i">MNIST</span> <span class="i">test</span> <span onmouseout="hideTip(event, 'fs3', 9)" onmouseover="showTip(event, 'fs3', 9)" class="i">set</span><span class="o">:</span> <span class="i">images</span> <span class="i">have</span> <span class="i">shape</span> [<span class="n">10000</span>; <span class="n">784</span>] <span class="k">and</span> <span class="i">labels</span> <span class="i">have</span> <span class="i">shape</span> [<span class="n">10000</span>; <span class="n">10</span>]
</code></pre></td>
</tr>
</table>
<p><code>Mnist.load</code> loads the MNIST dataset from the specified directory.
The second parameter specifies the ratio of training samples to put into the validation set.
Since we do not use a validation set in this sample, we set it to zero.
The <code>TrnValTst.toCuda</code> transfers the whole dataset to the GPU.</p>
<p><code>mnist.Trn</code> contains the training set and <code>mnist.Tst</code> contains the test set.
All training samples of a particular partition can be accessed using the <code>.All</code> property; e.g. <code>mnist.Trn.All</code> contains all training samples.</p>
<p>Hence <code>mnist.Trn.All.Img</code> is an array of shape <span class="math">\(60000 \times 784\)</span> that contains the flattened training images and <code>mnist.Trn.All.Lbl</code> is an array of shape <span class="math">\(60000 \times 10\)</span> containing the corresponding labels in one-hot encoding, i.e. <code>mnist.Trn.All.Lbl.[[n; c]] = 1</code> when the n-th training sample if of class (digit) c.</p>
<h2><a name="Defining-the-model" class="anchor" href="#Defining-the-model">Defining the model</a></h2>
<p>In Deep.Net a model is defined as a symbolic function of variables (input, target) and parameters (weights, biases, etc.).
In this example we construct our model using primitive operations; in a later chapter we will show how to combine predefined building blocks into a model.
To ensure that the model is valid, Deep.Net requires the specification of the shapes of all variables and parameters during the definition of the model.
For example <span class="math">\(a+b\)</span> is only valid if <span class="math">\(a\)</span> and <span class="math">\(b\)</span> are tensors of the same size (or can be broadcasted to the same size).
Similarly matrix multiplication <span class="math">\(a \cdot b\)</span> requires <span class="math">\(b\)</span> to have the as many rows as <span class="math">\(a\)</span> has columns.</p>
<p>The shape of a tensor can either be specified as a numeric value or, as usually done, using a symbolic value, e.g. the tensor <code>input</code> representing an input batch can be declared to be of size <span class="math">\(\mathrm{nInput} \times \mathrm{nBatch}\)</span> using the size symbols <code>nInput</code> and <code>nBatch</code>.
The model is checked as it is defined and errors are reported immediately in the offending source code line.
This way the long and painstaking process of identifying shape mismatches is avoided.</p>
<p>Each model definition in Deep.Net starts by instantiating <code>ModelBuilder</code>.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs4', 10)" onmouseover="showTip(event, 'fs4', 10)" class="i">mb</span> <span class="o">=</span> <span class="i">ModelBuilder</span><span class="o"><</span><span onmouseout="hideTip(event, 'fs5', 11)" onmouseover="showTip(event, 'fs5', 11)" class="i">single</span><span class="o">></span> <span class="s">"NeuralNetModel"</span>
</code></pre></td>
</tr>
</table>
<p>The model builder keeps track of symbolic sizes and parameters used in the model.
It takes a generic parameter that specifies that data type of the parameters (weights, etc.) used in the model.
Since CUDA GPUs provide best performance with 32-bit floating point arithmetic the data type <code>single</code> should be used almost always.
The non-generic parameter specifies a human-readable name for the model and will become more important when defining sub-models (models that can be composed into a larger model).</p>
<h3><a name="Defining-size-symbols" class="anchor" href="#Defining-size-symbols">Defining size symbols</a></h3>
<p>Next, we define the size symbols for batch size, input vector size, target size and hidden layer size.
Although the input vector size and target size are known in advance, we opt to define them as size symbols anyway to keep the model more general and, if we make a mistake, receive more readable error messages.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs6', 12)" onmouseover="showTip(event, 'fs6', 12)" class="i">nBatch</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 13)" onmouseover="showTip(event, 'fs4', 13)" class="i">mb</span><span class="o">.</span><span class="i">Size</span> <span class="s">"nBatch"</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs7', 14)" onmouseover="showTip(event, 'fs7', 14)" class="i">nInput</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 15)" onmouseover="showTip(event, 'fs4', 15)" class="i">mb</span><span class="o">.</span><span class="i">Size</span> <span class="s">"nInput"</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs8', 16)" onmouseover="showTip(event, 'fs8', 16)" class="i">nClass</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 17)" onmouseover="showTip(event, 'fs4', 17)" class="i">mb</span><span class="o">.</span><span class="i">Size</span> <span class="s">"nClass"</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs9', 18)" onmouseover="showTip(event, 'fs9', 18)" class="i">nHidden</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 19)" onmouseover="showTip(event, 'fs4', 19)" class="i">mb</span><span class="o">.</span><span class="i">Size</span> <span class="s">"nHidden"</span>
</code></pre></td>
</tr>
</table>
<h3><a name="Defining-model-parameters" class="anchor" href="#Defining-model-parameters">Defining model parameters</a></h3>
<p>Our model consists of one hidden layer with a tanh activation function and an output layer with a softmax activation.
Thus our parameters consists of the weights and biases for the hidden layer and the weights for the output layer.
The <code>mb.Param</code> method of the model builder is used to define the parameters.
It takes a human readable name of the parameter, the shape and optionally and initialization function as parameters.
To keep things simple, we do not specify an initialization function at this point.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs10', 20)" onmouseover="showTip(event, 'fs10', 20)" class="v">hiddenWeights</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 21)" onmouseover="showTip(event, 'fs4', 21)" class="i">mb</span><span class="o">.</span><span class="i">Param</span> (<span class="s">"hiddenWeights"</span>, [<span onmouseout="hideTip(event, 'fs9', 22)" onmouseover="showTip(event, 'fs9', 22)" class="i">nHidden</span>; <span onmouseout="hideTip(event, 'fs7', 23)" onmouseover="showTip(event, 'fs7', 23)" class="i">nInput</span>])
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs11', 24)" onmouseover="showTip(event, 'fs11', 24)" class="v">hiddenBias</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 25)" onmouseover="showTip(event, 'fs4', 25)" class="i">mb</span><span class="o">.</span><span class="i">Param</span> (<span class="s">"hiddenBias"</span>, [<span onmouseout="hideTip(event, 'fs9', 26)" onmouseover="showTip(event, 'fs9', 26)" class="i">nHidden</span>])
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs12', 27)" onmouseover="showTip(event, 'fs12', 27)" class="v">outputWeights</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 28)" onmouseover="showTip(event, 'fs4', 28)" class="i">mb</span><span class="o">.</span><span class="i">Param</span> (<span class="s">"outputWeights"</span>, [<span onmouseout="hideTip(event, 'fs8', 29)" onmouseover="showTip(event, 'fs8', 29)" class="i">nClass</span>; <span onmouseout="hideTip(event, 'fs9', 30)" onmouseover="showTip(event, 'fs9', 30)" class="i">nHidden</span>])
</code></pre></td>
</tr>
</table>
<p>As the curious reader may have noted, <code>mb.Param</code> returns a <a href="https://msdn.microsoft.com/visualfsharpdocs/conceptual/reference-cells-%5bfsharp%5d">reference cell</a> to an expression.
The reason behind this is that for training (optimizing) of the model, it is beneficial to have all model parameters concatenated into a single continuous vector.
However, before such a vector can be constructed, all parameters and their shapes must be known.
Hence the model builder must delay its construction and returns a reference cell that will be filled when the <code>mb.Instantiate</code> method is called.</p>
<h3><a name="Defining-model-variables" class="anchor" href="#Defining-model-variables">Defining model variables</a></h3>
<p>We also need to define the input and (desired) target variables of the model.
While parameters and variables are both tensors, the difference between them is that parameters are values associated and stored with the model (i.e. not depending on a particular data sample) while variables represent data samples and are passed into the model.
The <code>mb.Var</code> method of the model builder takes a human-readable name for the variable and its shape as arguments.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs13', 31)" onmouseover="showTip(event, 'fs13', 31)" class="i">input</span> <span class="o">:</span> <span class="i">ExprT</span><span class="o"><</span><span onmouseout="hideTip(event, 'fs5', 32)" onmouseover="showTip(event, 'fs5', 32)" class="i">single</span><span class="o">></span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 33)" onmouseover="showTip(event, 'fs4', 33)" class="i">mb</span><span class="o">.</span><span class="i">Var</span> <span class="s">"Input"</span> [<span onmouseout="hideTip(event, 'fs6', 34)" onmouseover="showTip(event, 'fs6', 34)" class="i">nBatch</span>; <span onmouseout="hideTip(event, 'fs7', 35)" onmouseover="showTip(event, 'fs7', 35)" class="i">nInput</span>]
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs14', 36)" onmouseover="showTip(event, 'fs14', 36)" class="i">target</span> <span class="o">:</span> <span class="i">ExprT</span><span class="o"><</span><span onmouseout="hideTip(event, 'fs5', 37)" onmouseover="showTip(event, 'fs5', 37)" class="i">single</span><span class="o">></span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 38)" onmouseover="showTip(event, 'fs4', 38)" class="i">mb</span><span class="o">.</span><span class="i">Var</span> <span class="s">"Target"</span> [<span onmouseout="hideTip(event, 'fs6', 39)" onmouseover="showTip(event, 'fs6', 39)" class="i">nBatch</span>; <span onmouseout="hideTip(event, 'fs8', 40)" onmouseover="showTip(event, 'fs8', 40)" class="i">nClass</span>]
</code></pre></td>
</tr>
</table>
<h3><a name="Instantiating-the-model" class="anchor" href="#Instantiating-the-model">Instantiating the model</a></h3>
<p>Instantiating a model constructs a parameter vector that contains all parameters and allocates the corresponding storage space on the host or GPU.
Since storage space allocation requires numeric values for the shape of the parameters (<code>hiddenWeights</code>, <code>hiddenBias</code>, <code>outputWeights</code>) we need to provide values for the corresponding size symbols <code>nInput</code>, <code>nHidden</code> and <code>nClass</code>.</p>
<p>We use the <code>mb.SetSize</code> method of the model builder that takes a symbolic size and the corresponding numeric value.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span onmouseout="hideTip(event, 'fs4', 41)" onmouseover="showTip(event, 'fs4', 41)" class="i">mb</span><span class="o">.</span><span class="i">SetSize</span> <span onmouseout="hideTip(event, 'fs7', 42)" onmouseover="showTip(event, 'fs7', 42)" class="i">nInput</span> <span onmouseout="hideTip(event, 'fs1', 43)" onmouseover="showTip(event, 'fs1', 43)" class="i">mnist</span><span class="o">.</span><span class="i">Trn</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Img</span><span class="o">.</span><span class="i">Shape</span><span class="o">.</span>[<span class="n">1</span>]
<span onmouseout="hideTip(event, 'fs4', 44)" onmouseover="showTip(event, 'fs4', 44)" class="i">mb</span><span class="o">.</span><span class="i">SetSize</span> <span onmouseout="hideTip(event, 'fs8', 45)" onmouseover="showTip(event, 'fs8', 45)" class="i">nClass</span> <span onmouseout="hideTip(event, 'fs1', 46)" onmouseover="showTip(event, 'fs1', 46)" class="i">mnist</span><span class="o">.</span><span class="i">Trn</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Lbl</span><span class="o">.</span><span class="i">Shape</span><span class="o">.</span>[<span class="n">1</span>]
<span onmouseout="hideTip(event, 'fs4', 47)" onmouseover="showTip(event, 'fs4', 47)" class="i">mb</span><span class="o">.</span><span class="i">SetSize</span> <span onmouseout="hideTip(event, 'fs9', 48)" onmouseover="showTip(event, 'fs9', 48)" class="i">nHidden</span> <span class="n">100</span>
</code></pre></td>
</tr>
</table>
<p>The number of inputs and classes are set from the corresponding shapes of the MNIST dataset and <code>nHidden</code> is set to 100.
Consequently we have a model with 784 input neurons, 100 hidden neurons and 10 output neurons.</p>
<p>Since all model sizes are defined and all parameters have been declared, we can now instantiate the model using the <code>mb.Instantiate</code> method of the model builder.
It takes a single argument specifying the location of the parameter vector.
This parameter can be <code>DevHost</code> for host (CPU) storage or <code>DevCuda</code> for GPU storage.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs15', 49)" onmouseover="showTip(event, 'fs15', 49)" class="i">mi</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs4', 50)" onmouseover="showTip(event, 'fs4', 50)" class="i">mb</span><span class="o">.</span><span class="i">Instantiate</span> <span class="i">DevCuda</span>
</code></pre></td>
</tr>
</table>
<p>This causes the allocation of a parameter vector of appropriate size on the CUDA GPU and the reference cells for the model parameters <code>hiddenWeights</code>, <code>hiddenBias</code> and <code>outputWeights</code> are now filled with symbolic tensor slices of the parameter vector.</p>
<h3><a name="Defining-model-expressions" class="anchor" href="#Defining-model-expressions">Defining model expressions</a></h3>
<p>We can now define the expressions for the model.
We start with the hidden layer.
Its value is given by <span class="math">\(\mathbf{h} = W_h \mathbf{x} + \mathbf{b_h}\)</span> where <span class="math">\(W_h\)</span> are the hidden weights and <span class="math">\(\mathbf{b_h}\)</span> is the hidden bias.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs16', 51)" onmouseover="showTip(event, 'fs16', 51)" class="i">hiddenAct</span> <span class="o">=</span> <span class="o">!</span><span onmouseout="hideTip(event, 'fs10', 52)" onmouseover="showTip(event, 'fs10', 52)" class="v">hiddenWeights</span> <span class="o">.</span><span class="o">*</span> <span onmouseout="hideTip(event, 'fs13', 53)" onmouseover="showTip(event, 'fs13', 53)" class="i">input</span><span class="o">.</span><span class="i">T</span> <span class="o">+</span> <span class="o">!</span><span onmouseout="hideTip(event, 'fs11', 54)" onmouseover="showTip(event, 'fs11', 54)" class="v">hiddenBias</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs17', 55)" onmouseover="showTip(event, 'fs17', 55)" class="i">hiddenVal</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs18', 56)" onmouseover="showTip(event, 'fs18', 56)" class="f">tanh</span> <span onmouseout="hideTip(event, 'fs16', 57)" onmouseover="showTip(event, 'fs16', 57)" class="i">hiddenAct</span>
</code></pre></td>
</tr>
</table>
<p>The <code>!</code> operator is used to dereference the reference cells for the parameters (see above).
The <code>.*</code> operator is defined as the dot product between two matrices or a matrix and a vector.
Since the <code>tanh</code> function is overloaded in Deep.Net, it can also be applied directly symbolic expressions.
This is also true for all standard arithmetic functions defined in F#, such as <code>sin</code>, <code>cos</code>, <code>log</code>, <code>ceil</code>.</p>
<p>Next, we define the expressions for the predictions of the model.
The output activations are given by <span class="math">\(\mathbf{g} = W_g \mathbf{h}\)</span> where <span class="math">\(W_g\)</span> are the output weights.
The class probabilities, i.e. the probabilities <span class="math">\(p(c=C)\)</span> that the sample is digit <span class="math">\(C\)</span>, are given by the <a href="https://en.wikipedia.org/wiki/Softmax_function">softmax function</a> <span class="math">\(p(c) = \exp(g_c) / \sum_{c'=0}^9 \exp(g_c')\)</span>.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs19', 58)" onmouseover="showTip(event, 'fs19', 58)" class="i">outputAct</span> <span class="o">=</span> <span class="o">!</span><span onmouseout="hideTip(event, 'fs12', 59)" onmouseover="showTip(event, 'fs12', 59)" class="v">outputWeights</span> <span class="o">.</span><span class="o">*</span> <span onmouseout="hideTip(event, 'fs17', 60)" onmouseover="showTip(event, 'fs17', 60)" class="i">hiddenVal</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs20', 61)" onmouseover="showTip(event, 'fs20', 61)" class="i">classProb</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs21', 62)" onmouseover="showTip(event, 'fs21', 62)" class="f">exp</span> <span onmouseout="hideTip(event, 'fs19', 63)" onmouseover="showTip(event, 'fs19', 63)" class="i">outputAct</span> <span class="o">/</span> <span class="i">Expr</span><span class="o">.</span><span class="i">sumKeepingAxis</span> <span class="n">0</span> (<span onmouseout="hideTip(event, 'fs21', 64)" onmouseover="showTip(event, 'fs21', 64)" class="i">exp</span> <span onmouseout="hideTip(event, 'fs19', 65)" onmouseover="showTip(event, 'fs19', 65)" class="i">outputAct</span>)
</code></pre></td>
</tr>
</table>
<p>Here we see the use of the <code>Expr.sumKeepingAxis</code> function.
It takes two arguments; the first argument specifies the axis to sum over and the second is the expression that should be summed.
The result has the same shape as the input but with the summation axis length set to one.
The <code>Expr.sumAxis</code> function sums over the specified axis returning a tensor with the summation axis removed.
The <code>Expr.sum</code> function sums over all axes returning a scalar tensor.</p>
<p>With the prediction expression fully defined, we still need to define a loss expression to train our models.
We use the standard <a href="https://en.wikipedia.org/wiki/Cross_entropy">cross entropy</a> loss in this example, <span class="math">\(L = - \sum_{c=0}^9 t_c \log p(c)\)</span> where <span class="math">\(t_c=1\)</span> if the sample is digit <span class="math">\(c\)</span> and <span class="math">\(0\)</span> otherwise (one-hot encoding).</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs22', 66)" onmouseover="showTip(event, 'fs22', 66)" class="i">smplLoss</span> <span class="o">=</span> <span class="o">-</span> <span class="i">Expr</span><span class="o">.</span><span class="i">sumAxis</span> <span class="n">0</span> (<span onmouseout="hideTip(event, 'fs14', 67)" onmouseover="showTip(event, 'fs14', 67)" class="i">target</span><span class="o">.</span><span class="i">T</span> <span class="o">*</span> <span onmouseout="hideTip(event, 'fs23', 68)" onmouseover="showTip(event, 'fs23', 68)" class="i">log</span> <span onmouseout="hideTip(event, 'fs20', 69)" onmouseover="showTip(event, 'fs20', 69)" class="i">classProb</span>)
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs24', 70)" onmouseover="showTip(event, 'fs24', 70)" class="i">loss</span> <span class="o">=</span> <span class="i">Expr</span><span class="o">.</span><span class="i">mean</span> <span onmouseout="hideTip(event, 'fs22', 71)" onmouseover="showTip(event, 'fs22', 71)" class="i">smplLoss</span>
</code></pre></td>
</tr>
</table>
<p>To have a loss expression that is independent of the batch size, we take the mean of the loss over each batch.</p>
<h3><a name="Compiling-functions" class="anchor" href="#Compiling-functions">Compiling functions</a></h3>
<p>With the expressions fully defined, we can compile the loss expression into a function.
This is done using the <code>mi.Func</code> method of the instantiated model.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs25', 72)" onmouseover="showTip(event, 'fs25', 72)" class="f">lossFn</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs15', 73)" onmouseover="showTip(event, 'fs15', 73)" class="i">mi</span><span class="o">.</span><span class="i">Func</span> <span onmouseout="hideTip(event, 'fs24', 74)" onmouseover="showTip(event, 'fs24', 74)" class="i">loss</span> <span class="o">|></span> <span class="i">arg2</span> <span onmouseout="hideTip(event, 'fs13', 75)" onmouseover="showTip(event, 'fs13', 75)" class="i">input</span> <span onmouseout="hideTip(event, 'fs14', 76)" onmouseover="showTip(event, 'fs14', 76)" class="i">target</span>
</code></pre></td>
</tr>
</table>
<p><code>mi.Func</code> produces a function that expects a variable environment as its sole argument.
A variable environment (VarEnv) is an <a href="https://msdn.microsoft.com/en-us/visualfsharpdocs/conceptual/collections.map['key,'value]-class-[fsharp]">F# map</a> where keys are the symbolic variables (such as <code>input</code> or <code>target</code>) and values are the corresponding numeric tensors.
Since this makes calling the function a little awkward, we pipe the resulting function into the wrapper <code>arg2</code>.
<code>arg2</code> translates a function taking a VarEnv into a function taking two arguments that are used as values for the two symbolic variables passed as parameters to <code>arg2</code>.
Hence, in this example the resulting <code>lossFn</code> function takes two tensors as arguments, with the first argument becoming the value for the symbolic variable <code>input</code> and the second argument becoming the value for <code>target</code>.
There are wrappers <code>arg1</code>, <code>arg3</code>, <code>arg4</code>, ... for different number of arguments.
The return type of a compiled function is always a tensor.
If the result of the expression is a scalar (as in our case), the tensor will have rank zero.</p>
<h2><a name="Initializing-the-model-parameters" class="anchor" href="#Initializing-the-model-parameters">Initializing the model parameters</a></h2>
<p>We initialize the parameters model by calling <code>mi.InitPars</code>.
The only argument to that function is the seed to use for random initialization of the model's parameters.
<code>mi.InitPars</code> samples from an uniform distribution with support <span class="math">\([-0.01, 0.01]\)</span> to initialize all model parameters that had no initializer specified when calling <code>mb.Param</code>.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span onmouseout="hideTip(event, 'fs15', 77)" onmouseover="showTip(event, 'fs15', 77)" class="i">mi</span><span class="o">.</span><span class="i">InitPars</span> <span class="n">123</span>
</code></pre></td>
</tr>
</table>
<p>We use a fixed seed of 123 to get reproducible results, but you can change it to a time-dependent value to get varying starting points.</p>
<h2><a name="Testing-the-model" class="anchor" href="#Testing-the-model">Testing the model</a></h2>
<p>We can now test our work so far by calculating the loss of the <em>untrained</em> model on the MNIST test set.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs26', 78)" onmouseover="showTip(event, 'fs26', 78)" class="i">tstLossUntrained</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs25', 79)" onmouseover="showTip(event, 'fs25', 79)" class="f">lossFn</span> <span onmouseout="hideTip(event, 'fs1', 80)" onmouseover="showTip(event, 'fs1', 80)" class="i">mnist</span><span class="o">.</span><span class="i">Tst</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Img</span> <span onmouseout="hideTip(event, 'fs1', 81)" onmouseover="showTip(event, 'fs1', 81)" class="i">mnist</span><span class="o">.</span><span class="i">Tst</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Lbl</span>
<span class="o">|></span> <span class="i">ArrayND</span><span class="o">.</span><span class="i">value</span>
<span onmouseout="hideTip(event, 'fs2', 82)" onmouseover="showTip(event, 'fs2', 82)" class="f">printfn</span> <span class="s">"Test loss (untrained): </span><span class="pf">%.4f</span><span class="s">"</span> <span onmouseout="hideTip(event, 'fs26', 83)" onmouseover="showTip(event, 'fs26', 83)" class="i">tstLossUntrained</span>
</code></pre></td>
</tr>
</table>
<p>We call our compiled loss function as expected and pipe the result into the <code>Tensor.value</code> function to extract the float value from the zero-rank tensor.
This should print something similar to</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Test</span> <span class="i">loss</span> (<span class="i">untrained</span>)<span class="o">:</span> <span class="n">2.3019</span>
</code></pre></td>
</tr>
</table>
<h2><a name="Training" class="anchor" href="#Training">Training</a></h2>
<p>An untrained model is a useless model.
To fix this we will use gradient descent to find parameter values where the loss is (locally) minimal.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs27', 84)" onmouseover="showTip(event, 'fs27', 84)" class="i">opt</span> <span class="o">=</span> <span class="i">Optimizers</span><span class="o">.</span><span class="i">GradientDescent</span> (<span onmouseout="hideTip(event, 'fs24', 85)" onmouseover="showTip(event, 'fs24', 85)" class="i">loss</span>, <span onmouseout="hideTip(event, 'fs15', 86)" onmouseover="showTip(event, 'fs15', 86)" class="i">mi</span><span class="o">.</span><span class="i">ParameterVector</span>, <span class="i">DevCuda</span>)
</code></pre></td>
</tr>
</table>
<p>We use the <code>GradientDescent</code> optimizer from the <code>Optimizers</code> library.
Each optimizer in Deep.Net takes three arguments: the expression to minimize, the variable with respect to the minimization should be performed and the device (DevHost or DevCuda).
The <code>mi.ParameterVector</code> property of the model instance provides a vector that is a concatenation of all model parameters defined via calls to <code>mb.Param</code>.
In our example <code>mi.ParameterVector</code> is a vector of length <span class="math">\(\mathrm{nHidden} \cdot \mathrm{nInput} + \mathrm{nHidden} + \mathrm{nClass} \cdot \mathrm{nHidden}\)</span> containing flattened views of the parameters <code>hiddenWeights</code>, <code>hiddenBias</code> and <code>outputWeights</code>.
Thus we have constructed a gradient descent optimizer that minimizes the loss with respect to our model's parameters.</p>
<p>What remains to be done, is to compile a function that performs an optimization step when called.
The expression that performs an optimization step when evaluated is provided in the <code>opt.Minimize</code> property of the optimizer instance.
We thus define the optimization function as follows.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs28', 87)" onmouseover="showTip(event, 'fs28', 87)" class="f">optFn</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs15', 88)" onmouseover="showTip(event, 'fs15', 88)" class="i">mi</span><span class="o">.</span><span class="i">Func</span> <span onmouseout="hideTip(event, 'fs27', 89)" onmouseover="showTip(event, 'fs27', 89)" class="i">opt</span><span class="o">.</span><span class="i">Minimize</span> <span class="o">|></span> <span onmouseout="hideTip(event, 'fs27', 90)" onmouseover="showTip(event, 'fs27', 90)" class="i">opt</span><span class="o">.</span><span class="i">Use</span> <span class="o">|></span> <span class="i">arg2</span> <span onmouseout="hideTip(event, 'fs13', 91)" onmouseover="showTip(event, 'fs13', 91)" class="i">input</span> <span onmouseout="hideTip(event, 'fs14', 92)" onmouseover="showTip(event, 'fs14', 92)" class="i">target</span>
</code></pre></td>
</tr>
</table>
<p>This definition is similar to <code>lossFn</code> with the addition of piping through <code>opt.Use</code>.
It makes the resulting function accept an additional parameter corresponding to the configuration of the optimizer and injects the necessary values into the VarEnv.
<code>opt.Use</code> must be used when compiling an optimization expression.
An optimization function returns an empty tensor (zero length).</p>
<p>Thus <code>optFn</code> is a function taking three parameters: the input images, the target labels and a record of type <code>GradientDescent.Cfg</code> that contains the optimizer configuration.</p>
<p>We still need to declare the optimizer configuration.
The gradient descent optimizer has a single configurable parameter: the learning rate.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs29', 93)" onmouseover="showTip(event, 'fs29', 93)" class="i">optCfg</span> <span class="o">=</span> { <span class="i">Optimizers</span><span class="o">.</span><span class="i">GradientDescent</span><span class="o">.</span><span class="i">Step</span><span class="o">=</span><span class="n">1e-1f</span> }
</code></pre></td>
</tr>
</table>
<p>We use a learning rate of <span class="math">\(0.1\)</span>.
This high learning rate is feasible because we will calculate the gradient on the whole dataset (50 000 images) and thus it will be very stable.
If we did <a href="http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">mini-batch training</a> instead, i.e. split the training set into small mini-batches and update the parameters after estimating the gradient on a mini-batch, we would have to use a smaller learning rate.</p>
<h3><a name="Training-loop" class="anchor" href="#Training-loop">Training loop</a></h3>
<p>We are now ready to train and evaluate our model using a simple training loop.</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
<span class="l">5: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">for</span> <span onmouseout="hideTip(event, 'fs30', 94)" onmouseover="showTip(event, 'fs30', 94)" class="i">itr</span> <span class="o">=</span> <span class="n">0</span> <span class="k">to</span> <span class="n">1000</span> <span class="k">do</span>
<span onmouseout="hideTip(event, 'fs28', 95)" onmouseover="showTip(event, 'fs28', 95)" class="f">optFn</span> <span onmouseout="hideTip(event, 'fs1', 96)" onmouseover="showTip(event, 'fs1', 96)" class="i">mnist</span><span class="o">.</span><span class="i">Trn</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Img</span> <span onmouseout="hideTip(event, 'fs1', 97)" onmouseover="showTip(event, 'fs1', 97)" class="i">mnist</span><span class="o">.</span><span class="i">Trn</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Lbl</span> <span onmouseout="hideTip(event, 'fs29', 98)" onmouseover="showTip(event, 'fs29', 98)" class="i">optCfg</span> <span class="o">|></span> <span onmouseout="hideTip(event, 'fs31', 99)" onmouseover="showTip(event, 'fs31', 99)" class="f">ignore</span>
<span class="k">if</span> <span onmouseout="hideTip(event, 'fs30', 100)" onmouseover="showTip(event, 'fs30', 100)" class="i">itr</span> <span class="o">%</span> <span class="n">50</span> <span class="o">=</span> <span class="n">0</span> <span class="k">then</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs32', 101)" onmouseover="showTip(event, 'fs32', 101)" class="i">l</span> <span class="o">=</span> <span onmouseout="hideTip(event, 'fs25', 102)" onmouseover="showTip(event, 'fs25', 102)" class="f">lossFn</span> <span onmouseout="hideTip(event, 'fs1', 103)" onmouseover="showTip(event, 'fs1', 103)" class="i">mnist</span><span class="o">.</span><span class="i">Tst</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Img</span> <span onmouseout="hideTip(event, 'fs1', 104)" onmouseover="showTip(event, 'fs1', 104)" class="i">mnist</span><span class="o">.</span><span class="i">Tst</span><span class="o">.</span><span class="i">All</span><span class="o">.</span><span class="i">Lbl</span> <span class="o">|></span> <span class="i">ArrayND</span><span class="o">.</span><span class="i">value</span>
<span onmouseout="hideTip(event, 'fs2', 105)" onmouseover="showTip(event, 'fs2', 105)" class="f">printfn</span> <span class="s">"Test loss after </span><span class="pf">%5d</span><span class="s"> iterations: </span><span class="pf">%.4f</span><span class="s">"</span> <span onmouseout="hideTip(event, 'fs30', 106)" onmouseover="showTip(event, 'fs30', 106)" class="i">itr</span> <span onmouseout="hideTip(event, 'fs32', 107)" onmouseover="showTip(event, 'fs32', 107)" class="i">l</span>
</code></pre></td>
</tr>
</table>
<p>We train for 1000 iterations using the whole dataset (50 000 images) in each iteration.
The loss is evaluated every 50 iterations on the test set (10 000 images) and printed.</p>
<p>This should produce output similar to</p>
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
<span class="l">2: </span>
<span class="l">3: </span>
<span class="l">4: </span>
<span class="l">5: </span>
</pre></td>
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Test</span> <span class="i">loss</span> <span class="i">after</span> <span class="n">0</span> <span class="i">iterations</span><span class="o">:</span> <span class="n">2.3019</span>
<span class="i">Test</span> <span class="i">loss</span> <span class="i">after</span> <span class="n">50</span> <span class="i">iterations</span><span class="o">:</span> <span class="n">2.0094</span>
<span class="i">Test</span> <span class="i">loss</span> <span class="i">after</span> <span class="n">100</span> <span class="i">iterations</span><span class="o">:</span> <span class="n">1.0628</span>
<span class="o">..</span><span class="o">..</span>
<span class="i">Test</span> <span class="i">loss</span> <span class="i">after</span> <span class="n">1000</span> <span class="i">iterations</span><span class="o">:</span> <span class="n">0.2713</span>
</code></pre></td>
</tr>
</table>
<p>Deep.Net also provides a generic training function with parameter and loss logging, automatic adjustment of the learning rate and automatic termination.
We will show its use in a later chapter.</p>
<h2><a name="Summary" class="anchor" href="#Summary">Summary</a></h2>
<p>In this introductory example we showed how to define symbolic sizes and build a two-layer neural network with a softmax output layer and cross-entropy loss using elementary mathematical operators.
Training was performed on the MNIST dataset using a simple training loop.</p>
<p>In the following sections, we will show how to assemble models from <a href="components.html">components</a> (such as neural layers and loss layers) and use a Deep.Net provided, configurable <a href="training.html">training function</a>.</p>
<div class="tip" id="fs1">val mnist : obj<br /><br />Full name: Model.mnist</div>
<div class="tip" id="fs2">val printfn : format:Printf.TextWriterFormat<'T> -> 'T<br /><br />Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn</div>
<div class="tip" id="fs3">val set : elements:seq<'T> -> Set<'T> (requires comparison)<br /><br />Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.set</div>
<div class="tip" id="fs4">val mb : obj<br /><br />Full name: Model.mb</div>
<div class="tip" id="fs5">Multiple items<br />val single : value:'T -> single (requires member op_Explicit)<br /><br />Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.single<br /><br />--------------------<br />type single = System.Single<br /><br />Full name: Microsoft.FSharp.Core.single</div>
<div class="tip" id="fs6">val nBatch : obj<br /><br />Full name: Model.nBatch</div>
<div class="tip" id="fs7">val nInput : obj<br /><br />Full name: Model.nInput</div>
<div class="tip" id="fs8">val nClass : obj<br /><br />Full name: Model.nClass</div>
<div class="tip" id="fs9">val nHidden : obj<br /><br />Full name: Model.nHidden</div>
<div class="tip" id="fs10">val hiddenWeights : obj ref<br /><br />Full name: Model.hiddenWeights</div>
<div class="tip" id="fs11">val hiddenBias : float ref<br /><br />Full name: Model.hiddenBias</div>
<div class="tip" id="fs12">val outputWeights : obj ref<br /><br />Full name: Model.outputWeights</div>
<div class="tip" id="fs13">val input : obj<br /><br />Full name: Model.input</div>
<div class="tip" id="fs14">val target : obj<br /><br />Full name: Model.target</div>
<div class="tip" id="fs15">val mi : obj<br /><br />Full name: Model.mi</div>
<div class="tip" id="fs16">val hiddenAct : float<br /><br />Full name: Model.hiddenAct</div>
<div class="tip" id="fs17">val hiddenVal : float<br /><br />Full name: Model.hiddenVal</div>
<div class="tip" id="fs18">val tanh : value:'T -> 'T (requires member Tanh)<br /><br />Full name: Microsoft.FSharp.Core.Operators.tanh</div>
<div class="tip" id="fs19">val outputAct : float<br /><br />Full name: Model.outputAct</div>
<div class="tip" id="fs20">val classProb : float<br /><br />Full name: Model.classProb</div>
<div class="tip" id="fs21">val exp : value:'T -> 'T (requires member Exp)<br /><br />Full name: Microsoft.FSharp.Core.Operators.exp</div>
<div class="tip" id="fs22">val smplLoss : int<br /><br />Full name: Model.smplLoss</div>
<div class="tip" id="fs23">val log : value:'T -> 'T (requires member Log)<br /><br />Full name: Microsoft.FSharp.Core.Operators.log</div>
<div class="tip" id="fs24">val loss : obj<br /><br />Full name: Model.loss</div>
<div class="tip" id="fs25">val lossFn : (obj -> obj -> obj)<br /><br />Full name: Model.lossFn</div>
<div class="tip" id="fs26">val tstLossUntrained : float<br /><br />Full name: Model.tstLossUntrained</div>
<div class="tip" id="fs27">val opt : obj<br /><br />Full name: Model.opt</div>
<div class="tip" id="fs28">val optFn : (obj -> obj -> obj -> obj)<br /><br />Full name: Model.optFn</div>
<div class="tip" id="fs29">val optCfg : obj<br /><br />Full name: Model.optCfg</div>
<div class="tip" id="fs30">val itr : int</div>
<div class="tip" id="fs31">val ignore : value:'T -> unit<br /><br />Full name: Microsoft.FSharp.Core.Operators.ignore</div>
<div class="tip" id="fs32">val l : float</div>
</div>
<div class="span3">
<!-- <img src="http://www.deepml.net/img/logo.png" alt="Deep.Net logo" style="width:150px;margin:10px" /> -->
<ul class="nav nav-list" id="menu" style="margin-top: 20px;">
<li class="nav-header">Deep.Net</li>
<li><a href="http://www.deepml.net/index.html">Home page</a></li>
<li class="divider"></li>
<li><a href="http://nuget.org/packages/DeepNet">Get Library via NuGet</a></li>
<li><a href="http://github.com/DeepMLNet/DeepNet">Source Code on GitHub</a></li>
<li><a href="http://www.deepml.net/release-notes.html">Release Notes</a></li>
<li class="nav-header">Basics</li>
<li><a href="http://www.deepml.net/tensor.html">Working with Tensors</a></li>
<li><a href="http://www.deepml.net/model.html">Model Definition</a></li>
<li><a href="http://www.deepml.net/components.html">Model Components</a></li>
<li><a href="http://www.deepml.net/dataset.html">Dataset Handling</a></li>
<li><a href="http://www.deepml.net/training.html">Training</a></li>
<li class="nav-header">Advanced</li>
<li><a href="http://www.deepml.net/diff.html">Automatic Differentiation</a></li>
<li class="nav-header">Documentation</li>
<li><a href="http://www.deepml.net/reference/index.html">API Reference</a></li>
</ul>
</div>
</div>
</div>
<a href="http://github.com/DeepMLNet/DeepNet"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://s3.amazonaws.com/github/ribbons/forkme_right_gray_6d6d6d.png" alt="Fork me on GitHub"/></a>
</body>
</html>