Kyle Banks

Transfer Learning and Retraining Inception/MobileNet with TensorFlow and Docker

TensorFlow is a modern machine learning framework that provides tremendous power and opportunity to developers and data scientists. One of those opportunities is to use the concept of Transfer Learning to reduce training time and complexity by repurposing a pre-trained model.

From the official docs:

Modern object recognition models have millions of parameters and can take weeks to fully train. Transfer learning is a technique that shortcuts a lot of this work by taking a fully-trained model for a set of categories like ImageNet, and retrains from the existing weights for new classes. In this example we’ll be retraining the final layer from scratch, while leaving all the others untouched. For more information on the approach you can see this paper on Decaf.

Though it’s not as good as a full training run, this is surprisingly effective for many applications, and can be run in as little as thirty minutes on a laptop, without requiring a GPU.

In this tutorial we’ll learn how to utilize Transfer Learning to repurpose a pre-trained Inception or MobileNet model provided by TensorFlow to serve a new purpose.

What’s unique about this tutorial however, is that we’ll do it all without installing TensorFlow, instead performing training and predictions entirely through Docker. Why? Because it simplifies the process of getting this working on multiple machines, whether that is multiple developers and data scientists, your continuous integration process, or your production pipelines. Performing training in Docker ensures that no matter what machine is used to train your models, it will work without additional setup.

If you want to skip ahead and just get to work with the pre-built Docker images, head over to KyleBanks/tensorflow-docker-retrain on GitHub and follow the quick instructions there.

Gathering Images

The first (and honestly, most challenging) step of any image classification problem is to gather the image dataset. For this tutorial we’ll reuse the MNIST database of handwritten digits (0-9), which contains 60,000 training examples and 10,000 testing examples. This saves considerable time in gathering and preparing a custom dataset, and lets us get right into the real work of training and predicting.

In order to further reduce the amount of work we need to do to get started, I’ve converted the dataset into JPG format, which you can download and unzip from here: mnist-jpegs.zip. After downloading and unzipping you should have a directory structure like so:

test-images/
    0/
    1/
    2/
    3/
    4/
    5/
    6/
    7/
    8/  
    9/ 
train-images/
    0/
    1/
    2/
    3/
    4/
    5/
    6/
    7/
    8/  
    9/ 

Within these directories you’ll find thousands of 28x28 images that look like so:

MNIST sample 0 MNIST sample 6

The way TensorFlow expects images to be laid out is exactly as seen in the directory structure above: one folder per label, each containing the appropriate images. In this case our labels are the digits 0 through 9, so we have a folder for each. The test-images and train-images folders are going to be used by us to separate our training images from the images we’ll test on, and are not important to TensorFlow necessarily.

If you have your own image dataset that you want to work with, please feel free to do so! You’ll simply need to structure it like you see above, one folder per label where the folder name is the label name. For instance, if we were classifying images of comic book characters we might have a directory structure of:

iron-man/
captain-america/
green-lantern/
batman/

Training

Now that we have our dataset, it’s time to get to work. We’ll start by creating a Docker image to handle training: train-classifier/Dockerfile. It’s important to put this in the train-classifier directory as we’ll be creating multiple Docker images for different purposes by the end of the tutorial.

FROM gcr.io/tensorflow/tensorflow:1.4.0

ENV IMAGE_SIZE 128
ENV OUTPUT_GRAPH tf_files/retrained_graph.pb
ENV OUTPUT_LABELS tf_files/retrained_labels.txt
ENV ARCHITECTURE mobilenet_0.50_${IMAGE_SIZE}
ENV TRAINING_STEPS 1000

VOLUME /output
VOLUME /input

RUN curl -O https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/examples/image_retraining/retrain.py

ENTRYPOINT python -m retrain \
  --how_many_training_steps="${TRAINING_STEPS}" \
  --model_dir=/output/tf_files/models/ \
  --output_graph=/output/"${OUTPUT_GRAPH}" \
  --output_labels=/output/"${OUTPUT_LABELS}" \
  --architecture="${ARCHITECTURE}" \
  --image_dir=/input

Let’s walk through this from top-to-bottom.

First, we use the official TensorFlow v1.4.0 image (the latest at time of writing) as our base, which comes with TensorFlow preinstalled and configured for us, one of the many benefits of Docker.

Next we define the default training parameters as environment variables which will allow us to eventually override them at runtime as necessary. We set our image size to 128 pixels even though our images are 28x28, because that’s the minimum size allowed, meaning our images will be upscaled, which is fine because they are all being upscaled the same way. Next we define the output location of the trained graph and the labels text file that will be generated by the training script. For the ARCHITECTURE you can see we’re using MobileNet with a size of 0.50 and the image size as the suffix. I’m using MobileNet here in order to reduce training time and size of the trained model, but it does sacrifice some performance. If you wish to use Inception you can set the value of ARCHITECTURE to inception_v3. For more on the architecture and what these values all mean, I recommend you read Other Model Architectures from the official documentation. Finally we indicate 1000 training steps, which allows us to train quickly without doing a huge training cycle (the default is 4000), at the potential cost of performance. Generally once you have a good idea that your model will perform well you’ll want to play with increasing the training steps and training for longer to see if the improved performance is worth the increased training time.

Next we mount two volumes, /output and /input. Volumes allow us to share data between the host machine and the Docker container. /output will be used to retrieve the trained model and labels from the Docker container once training is complete, and /input will be used to provide the Docker container with our training images. We’ll come back to these shortly.

Now it’s time to retrieve the retraining script, which we do by downloading it from the official TensorFlow repository on GitHub. We could write our own, but it’s definitely easier to start out with the existing training script until you need to do something custom.

Finally, we define the entrypoint of the Docker container to be the retraining script invoked with our parameters. As you can see we are instructing the retraining script on how many training steps to perform, the output locations, the architecture to retrain, and the location of our training image set.

With the Dockerfile created, we can go ahead and build the image:

$ docker build -t train-classifier train-classifier/

With the image built, it’s time to invoke it. Since the environment variables are all setup to match what we intend, all that we need to do is specify the location of the two volumes: /output and /input:

$ docker run -it \  
    -v $(pwd)/output:/output \
    -v $(pwd)/train-images:/input
    train-classifier

Here we’ve set /output equal to a new output directory in the present working directory, and /input as the path to the train-images/ directory containing the MNIST training image dataset.

If all goes well, you should eventually see output similar to:

...
INFO:tensorflow: Step 980: Train accuracy = 96.0%
INFO:tensorflow: Step 980: Cross entropy = 0.122227
INFO:tensorflow: Step 980: Validation accuracy = 92.0% (N=100)
INFO:tensorflow: Step 990: Train accuracy = 97.0%
INFO:tensorflow: Step 990: Cross entropy = 0.164350
INFO:tensorflow: Validation accuracy = 93.0% (N=100)
INFO:tensorflow: Step 999: Train accuracy = 95.0%
INFO:tensorflow: Step 999: Cross entropy = 0.159146
INFO:tensorflow: Step 999: Validation accuracy = 95.0% (N=100)
INFO:tensorflow:Final test accuracy = 94.1% (N=6060)

Note: It may take 10-20 minutes to perform training, hence why we chose to use time-friendly variables in the Dockerfile environment configuration above.

Using the Retrained Model for Predictions

Now that the model is trained it’s ready to perform some predictions! Again we’ll create a Docker image to bundle the prediction logic which will be invoked against a folder containing images to predict - our test-images/ folder from the MNIST dataset in this case. Since these images were not used for training they will serve as a valid image set to test with.

To start with let’s create a predictor/ folder beside the train-classifier/ folder, as these will be our two Docker images. Inside predictor/ let’s add a new Dockerfile:

FROM gcr.io/tensorflow/tensorflow:1.4.0

ENV IMAGE_SIZE 128
ENV IMAGE_MEAN 0
ENV IMAGE_STD 255
ENV GRAPH_FILE retrained_graph.pb
ENV LABELS_FILE retrained_labels.txt
ENV INPUT_LAYER input
ENV OUTPUT_LAYER final_result

VOLUME /model
VOLUME /input

RUN curl -O https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/examples/label_image/label_image.py
ADD predict.sh .

ENTRYPOINT ["/bin/bash", "predict.sh"]

Again, let’s walk through this line by line. We start off the same as before by using the standard TensorFlow Docker image as our base, specifying version 1.4.0, the latest at the time of writing.

Next up we’re going to define the customizable runtime properties as environment variables, using defaults appropriate for this task. You’ll see we use the same image size as in training, and the same graph and label file names that our train-classifier container created for us. What’s changed though is we now have a few new variables, namely IMAGE_MEAN, IMAGE_STD, INPUT_LAYER, and OUTPUT_LAYER. The image mean and std values specified simply match the defaults in the TensorFlow prediction script but allow you to tune them as needed, and the input and output layer names match the layer names in our trained model.

Next we’ll mount two volumes just like we did in training, but this time we have /model which maps to the folder containing the trained model and label files, and /input which maps to the folder containing images to predict.

Next up we download the label_image.py script from the official TensorFlow repository and add our own predict.sh script that we’ll see in a moment.

Finally we define the entrypoint of the container to be the predict.sh script below:

#!/bin/bash
echo "-------------------------------------------------"
echo -e "\n\n"
env
echo -e "\n\n"
echo "-------------------------------------------------"
echo -e "\n\nPredicting $@\n\n"
echo "-------------------------------------------------"

python -m label_image \
    --input_width="$IMAGE_SIZE" \
    --input_height="$IMAGE_SIZE" \
    --input_mean="$IMAGE_MEAN" \
    --input_std="$IMAGE_STD" \
    --input_layer="$INPUT_LAYER" \
    --output_layer="$OUTPUT_LAYER" \
    --graph=/model/"$GRAPH_FILE" \
    --labels=/model/"$LABELS_FILE" \
    --image=/input/$@

This bash script simply prints the environment variables and the name of the image we’re predicting, and then calls the label_image script with the set configurations. This script is mostly just helpful to see what parameters you’re using and the image being predicted, and you can modify it as you see fit. Alternatively you could place the python script execution in the Dockerfile just like we did in the train-classifier version - it’s all up to you.

Alright, let’s build the predictor image:

$ docker build -t predictor predictor/

In a moment you should have a built predictor image that we can use to perform predictions. In order to predict, we’ll invoke the predictor by mounting the two required volumes, where /model is the path to our output/tf_files directory and /input is the path to the test images folder containing the image to predict, and finally the name of the image to predict.

For example, let’s predict test-images/5/1376.jpg:

$ docker run -it \
    -v $(pwd)/output/tf_files:/model \
    -v $(pwd)/test-images/5:/input \
    predictor 1376.jpg

If all goes well you should get a response like so:

-------------------------------------------------


Predicting 1376.jpg


-------------------------------------------------
5 0.953844
3 0.0283552
8 0.0168694
0 0.000464269
6 0.000233758

What this tells us is that the label 5 was predicted with 95% confidence, and in fact the predictor was correct. Go ahead and try it out on several images to see the results!

Conclusion

Transfer Learning in Docker turns out to be rather straightforward (assuming you have some Docker experience), and greatly simplifies the setup process. Installing TensorFlow can be a painful process at times, so having to do so on you and your teams’ machines, as well as your production and staging environments is less than ideal. Docker simplifies the process and allows you to reuse the same Docker image on as many machines as you like, with no additional setup.

Check out the project on GitHub at KyleBanks/tensorflow-docker-retrain and let me know what you think!

Let me know if this post was helpful on Twitter @kylewbanks or down below!