Get Hands On - Tensorflow Example

Dependencies

Before playing around with a tensroflow implementation, lets make sure that you have the following library dependencies installed.

tensorflow as tf >= 1.3
numpy
pillow

You can install then through pip.

pip numpy
pip install "tensorflow>=1.3"
pip install Pillow

The code was tested on python 3.4, but I have tried to make the code compatible with python 2.7, so it should hopefully work on earlier versions like 2.7 as well.

Basic Tensorflow Model

We can now play around with the simple architecture that was introduced in the previous section, implemented in Tensorflow. You can download the source code for this tutorial from this Github Repo. This can be downloaded to your computer using git.

git clone https://github.com/ronrest/segmentation_tut_code.git

You can then go into the project directory using:

cd segmentation_tut_code

The important file to look at is train.py. It is here that the architecture of the model is implemented, in a function called model_logits.

The first thing you will notice inside this function is the preprocessing operations to scale the images from integer values between 0-255 to float values between 0-1.

with tf.name_scope("preprocess") as scope:
    x = tf.div(X, 255., name="rescaled_inputs")

Next, you will see a chunk of code to perform the down-sampling portion of the architecture.

# DOWN CONVOLUTIONS
with tf.contrib.framework.arg_scope(\
    [conv], \
    padding = "SAME",
    stride = 2,
    activation_fn = relu,
    normalizer_fn = batchnorm,
    normalizer_params = {"is_training": is_training},
    weights_initializer =tf.contrib.layers.xavier_initializer(),
    ):
    with tf.variable_scope("d1") as scope:
        d1 = conv(x, num_outputs=32, kernel_size=3, stride=1, scope="conv1")
        d1 = conv(d1, num_outputs=32, kernel_size=3, scope="conv2")
        d1 = dropout_layer(d1, rate=dropout, name="dropout")
        print("d1", d1.shape.as_list())
    with tf.variable_scope("d2") as scope:
        d2 = conv(d1, num_outputs=64, kernel_size=3, stride=1, scope="conv1")
        d2 = conv(d2, num_outputs=64, kernel_size=3, scope="conv2")
        d2 = dropout_layer(d2, rate=dropout, name="dropout")
        print("d2", d2.shape.as_list())
    with tf.variable_scope("d3") as scope:
        d3 = conv(d2, num_outputs=128, kernel_size=3, stride=1, scope="conv1")
        d3 = conv(d3, num_outputs=128, kernel_size=3, scope="conv2")
        d3 = dropout_layer(d3, rate=dropout, name="dropout")
        print("d3", d3.shape.as_list())
    with tf.variable_scope("d4") as scope:
        d4 = conv(d3, num_outputs=256, kernel_size=3, stride=1, scope="conv1")
        d4 = conv(d4, num_outputs=256, kernel_size=3, scope="conv2")
        d4 = dropout_layer(d4, rate=dropout, name="dropout")
        print("d4", d4.shape.as_list())

The tf.contrib.framework.arg_scope call at the very beginning is a way to override the default values of a tensorflow function. In this case, it overrides the default values for the convolutional layer function. It allows you to specify what values you want the chosen function to have by default. In this case, we are telling it that every time we call the convolutional layer function, we want it to use "SAME" padding, a stride of 2, a relu activation function, batch normalization, and xavier weight initialization. This allows us to keep the function calls short and neat since the only things that now need to be passed are the things that actually change. If you want more information about argument scopes, please check out this blog post.

The next chunk of code is the upsampling portion of the architecture.

# UP CONVOLUTIONS
with tf.contrib.framework.arg_scope([deconv, conv], \
    padding = "SAME",
    activation_fn = None,
    normalizer_fn = tf.contrib.layers.batch_norm,
    normalizer_params = {"is_training": is_training},
    weights_initializer = tf.contrib.layers.xavier_initializer(),
    ):
    with tf.variable_scope('u3') as scope:
        u3 = deconv(d4, num_outputs=n_classes, kernel_size=4, stride=2)
        s3 = conv(d3, num_outputs=n_classes, kernel_size=1, stride=1, activation_fn=relu, scope="s")
        u3 = tf.add(u3, s3, name="up")
        print("u3", u3.shape.as_list())
    with tf.variable_scope('u2') as scope:
        u2 = deconv(u3, num_outputs=n_classes, kernel_size=4, stride=2)
        s2 = conv(d2, num_outputs=n_classes, kernel_size=1, stride=1, activation_fn=relu, scope="s")
        u2 = tf.add(u2, s2, name="up")
        print("u2", u2.shape.as_list())
    with tf.variable_scope('u1') as scope:
        u1 = deconv(u2, num_outputs=n_classes, kernel_size=4, stride=2)
        s1 = conv(d1, num_outputs=n_classes, kernel_size=1, stride=1, activation_fn=relu, scope="s")
        u1 = tf.add(u1, s1, name="up")
        print("u1", u1.shape.as_list())
    logits = deconv(u1, num_outputs=n_classes, kernel_size=4, stride=2, activation_fn=None, normalizer_fn=None, scope="logits")

We again specify some default values, but this time for both convolutional and transpose convolutional layers. I have labeled the transpose convolutional layers as deconv simply because it keeps things short and neat. However, you should keep in mind that it is technically incorrect to call it a deconvolution operation.

The s1, s2, s1 layers are the convolutional operations associated with the skip connections we saw in the diagram previously. You can see in the code that the outputs of the transpose convolutions and the skip connections are added together in an elementwise manner.

Training

Before you use the scripts to begin training, you should go to the top of the train.py file, in the SETTINGS section. Make sure that you specify the file path to the SiSI dataset that you downloaded previously for the variable named data_file.

You can go to the bottom of the file to modify any of the training settings if you wish. By default, it runs for 60 epochs. It only takes about a two minutes per epoch on an Intel i7 2.20GHz quad-core processor.

model.train(data, n_epochs=60, alpha=0.0001, batch_size=4, print_every=40)

Running this script creates several subdirectories.

samples : Images that show you a visualization of how your model is progressing through time are created after each epoch and saved in this subdirectory. Images prefixed by train_ refer to predictions your model does on the same data it has been using for training. Images prefixed by valid_ refer to predictions your model does on data it has not seen during training. SOme samples of the images will be shown below.
snapshots : Tensroflow checkpoint files are saved here so that you can pick up where you left off last time, and not have to retrain the model from scratch.
tensorboard : This contains the files associated with tensorflow, which will allow you to interactively visualize the architecture that was created in a web browser.

Visualizing the progress

The images created in the samples subdirectory allow you to visualize how well the model is doing, and track its progress over time. The images contain several samples in a grid. The columns of the grid contain triplets of images. The top one is the input image, the second one is the correct label, and the third one is the prediction made by the model.

Below is an animated gif of the 60 visualization images created by the model. You will notice that in the early stages, it just creates shapeless blobs in the center of the image, but it later starts to learn the outlines of objects and starts to become better at identifying the different types of animals.

Animation of trian progress

Tips

Semantic Segmentation is a task that really doesn't seem to benefit from having large batch sizes. Larger batch sizes actually seem to make the models worse. batch sizes of about 4 are a good default value to start off with. Feel free to experiment with values in the vicinity to see if you get better results.