@Yano
2017-07-15T19:32:46.000000Z
字数 5985
阅读 1777
deeplearning
title: LeNet MNIST Tutorial
description: Train and test "LeNet" on the MNIST handwritten digit data.
category: example
include_in_docs: true
priority: 1
You will first need to download and convert the data format from the MNIST website. To do this, simply run the following commands:
cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh
Specifically, we will write a caffe::NetParameter
(or in python, caffe.proto.caffe_pb2.NetParameter
) protobuf. We will start by giving the network a name:
name: "LeNet"
Currently, we will read the MNIST data from the lmdb we created earlier in the demo. This is defined by a data layer:
type: "Data"
}
backend: LMDB
batch_size: 64
}
top: "data"
top: "label"
}
Let's define the first convolution layer:
layer {
name: "conv1"
type: "Convolution"
param { lr_mult: 1 }
param { lr_mult: 2 }
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "data"
top: "conv1"
}
Phew. Pooling layers are actually much easier to define:
layer {
name: "pool1"
type: "Pooling"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
bottom: "conv1"
This says we will perform max pooling with a pool kernel size 2 and a stride of 2 (so no overlapping between neighboring pooling regions).
Similarly, you can write up the second convolution and pooling layers. Check $CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt
for details.
Writing a fully connected layer is also simple:
layer {
name: "ip1"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "pool2"
top: "ip1"
}
This defines a fully connected layer (known in Caffe as an InnerProduct
layer) with 500 outputs. All other lines look familiar, right?
A ReLU Layer is also simple:
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
After the ReLU layer, we will write another innerproduct layer:
layer {
name: "ip2"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "ip1"
top: "ip2"
}
Finally, we will write the loss!
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
}
Layer definitions can include rules for whether and when they are included in the network definition, like the one below:
layer {
// ...layer definition...
include: { phase: TRAIN }
}
This is a rule, which controls layer inclusion in the network, based on current network's state.
You can refer to $CAFFE_ROOT/src/caffe/proto/caffe.proto
for more information about layer rules and model schema.
In the above example, this layer will be included only in TRAIN
phase.
If we change TRAIN
with TEST
, then this layer will be used only in test phase.
By default, that is without layer rules, a layer is always included in the network.
Thus, lenet_train_test.prototxt
has two DATA
layers defined (with different batch_size
), one for the training phase and one for the testing phase.
Also, there is an Accuracy
layer which is included only in TEST
phase for reporting the model accuracy every 100 iteration, as defined in lenet_solver.prototxt
.
Check out the comments explaining each line in the prototxt $CAFFE_ROOT/examples/mnist/lenet_solver.prototxt
:
# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU
Training the model is simple after you have written the network definition protobuf and solver protobuf files. Simply run train_lenet.sh
, or the following command directly:
cd $CAFFE_ROOT
./examples/mnist/train_lenet.sh
train_lenet.sh
is a simple script, but here is a quick explanation: the main tool for training is caffe
with action train
and the solver protobuf text file as its argument.
When you run the code, you will see a lot of messages flying by like this:
I1203 net.cpp:66] Creating Layer conv1
I1203 net.cpp:76] conv1 <- data
I1203 net.cpp:101] conv1 -> conv1
I1203 net.cpp:116] Top shape: 20 24 24
I1203 net.cpp:127] conv1 needs backward computation.
I1203 net.cpp:142] Network initialization done.
I1203 solver.cpp:36] Solver scaffolding done.
I1203 solver.cpp:44] Solving LeNet
Based on the solver setting, we will print the training loss function every 100 iterations, and test the network every 500 iterations. You will see messages like this:
I1203 solver.cpp:204] Iteration 100, lr = 0.00992565
I1203 solver.cpp:66] Iteration 100, loss = 0.26044
...
I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9785
I1203 solver.cpp:111] Test score #1: 0.0606671
For each training iteration, lr
is the learning rate of that iteration, and loss
is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function.
And after a few minutes, you are done!
I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9897
I1203 solver.cpp:111] Test score #1: 0.0324599
I1203 solver.cpp:126] Snapshotting to lenet_iter_10000
I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstate
I1203 solver.cpp:78] Optimization Done.
The final model, stored as a binary protobuf file, is stored at
lenet_iter_10000
which you can deploy as a trained model in your application, if you are training on a real-world application dataset.
You just did! All the training was carried out on the GPU. In fact, if you would like to do training on CPU, you can simply change one line in lenet_solver.prototxt
:
# solver mode: CPU or GPU
solver_mode: CPU
and you will be using CPU for training. Isn't that easy?
MNIST is a small dataset, so training with GPU does not really introduce too much benefit due to communication overheads. On larger datasets with more complex models, such as ImageNet, the computation speed difference will be more significant.
Look at lenet_multistep_solver.prototxt