The examples in this directory have been adapted from the TensorFlow
tutorials.
To execute these examples, you will have to unzip the MNIST data
files in data/
.
The code can be found in mnist_linear.ml
.
We first load the MNIST data. This is done using the MNIST helper module, labels are returned as a vector of integer. Train images and labels are used when training the model. Test images and labels are used to estimate the validation error.
let m = vision::mnist::load_dir("data").unwrap();
After that two tensors are initialized to hold the weights and biases for the
linear classifier. requires_grad
is used when creating the tensors to inform
torch that we will compute some gradients with respect to these tensors.
let mut ws = Tensor::zeros(&[IMAGE_DIM, LABELS], kind::FLOAT_CPU).set_requires_grad(true);
let mut bs = Tensor::zeros(&[LABELS], kind::FLOAT_CPU).set_requires_grad(true);
Using these the model is defined as multiplying an input by the weight matrix and adding the bias. A softmax function is used to transform the output into a probability distribution.
let logits = m.train_images.mm(&ws) + &bs;
We use gradient descent to minimize cross-entropy with respect to variables
ws
and bs
and iterate this a couple hundred times.
Rather than using an optimizer we perform the gradient descent updates manually.
This is only to illustrate how gradients can be computed and used. Other examples
such as mnist_nn.rs
or mnist_conv.rs
use an Adam optimizer.
for epoch in 1..200 {
let logits = m.train_images.mm(&ws) + &bs;
let loss = logits.log_softmax(-1).nll_loss(&m.train_labels);
ws.zero_grad();
bs.zero_grad();
loss.backward();
no_grad(|| {
ws += ws.grad() * (-1);
bs += bs.grad() * (-1);
});
Running this code should build a model that has ~92% accuracy.
The code can be found in mnist_nn.rs
, accuracy should reach ~96%.
The code can be found in mnist_conv.rs
, accuracy should reach ~99%.
When building models with multiple weights and bias parameters we use a variable store to keep track of these variables and let the optimizer know about them. The variable store is created in the first line of the following snippet, then the model is built using this variable store and finally we create an optimizer that will performs gradient descent over the parameters that have been added to the variable store when creating the network.
let vs = nn::VarStore::new(Device::cuda_if_available());
let net = Net::new(&vs.root());
let opt = nn::Optimizer::adam(&vs, 1e-4, Default::default());
Note that this will automatically run on a GPU when available. For a convolutional model we cannot run on all the training images in a single step as this would requires lots of memory. Instead we run on mini-batches. An iterator makes it easy to loop over all the training images shuffled and grouped by mini-batches. This is done in the main training loop:
for epoch in 1..100 {
for (bimages, blabels) in m.train_iter(256).shuffle().to_device(vs.device()) {
let loss = net
.forward_t(&bimages, true)
.cross_entropy_for_logits(&blabels);
opt.backward_step(&loss);
}
let test_accuracy =
net.batch_accuracy_for_logits(&m.test_images, &m.test_labels, vs.device(), 1024);
println!("epoch: {:4} test acc: {:5.2}%", epoch, 100. * test_accuracy,);
}