Using pre-trained Deep Convolution Neural Networks as feature identifiers.

Objective

The Crux

We are going to use a Resnet-50 Model trained on ImageNet dataset, as a feature extractor, for a set of input images. The extracted features will be used by a Linear SVM to predict whether any random input image, is in either one of two classes that it belongs.

More information on ResNet

The practical explaination

The universe under consideration contains images that are of only two categories. It should either be an image of an iPhone or a Macbook.

Our task is to build a classifier that will correctly classify the input image as either an iPhone or MacBook.

To achieve this task, we are going to use a pre-trained Deep Neural Network(Resnet-50) to extract the features from our training set and use those features to train an SVM. Then, we use this SVM to predict the class of the input image.

Step 1: Download pre-trained models

A model often contains two parts, the .json file specifying the neural network structure, and the .params file containing the binary parameters. The name convention is name-symbol.json and name-epoch.params, where name is the model name, and epoch is the epoch number.

Here we download a pre-trained Resnet 50-layer model on Imagenet. Other models are available at http://data.mxnet.io/models/

In [1]:
import os, urllib
def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.urlretrieve(url, filename)
def get_model(prefix, epoch):
    download(prefix+'-symbol.json')
    download(prefix+'-%04d.params' % (epoch,))

get_model('http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50', 0)

Step 2: Initialization

We first load the model into memory with load_checkpoint. It returns the symbol (see symbol.ipynb) definition of the neural network, and parameters.

In [2]:
import mxnet as mx
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-50', 0)

Both argument parameters and auxiliary parameters (e.g mean/std in batch normalization layer) are stored as a dictionary of string name and ndarray value (see ndarray.ipynb). The arguments consist of weight and bias.

Next we create an executable module (see module.ipynb) on CPU 0. To use a difference device, we just need to charge the context, e.g. mx.cpu() for CPU and mx.gpu(2) for the 3rd GPU.

In [3]:
mod = mx.mod.Module(symbol=sym, context=mx.cpu())

The ResNet is trained with RGB images of size 224 x 224. The training data is feed by the variable data. We bind the module with the input shape and specify that it is only for predicting. The number 1 added before the image shape (3x224x224) means that we will only predict one image each time. Next we set the loaded parameters. Now the module is ready to run.

In [4]:
mod.bind(for_training = False,
         data_shapes=[('data', (1,3,224,224))])
mod.set_params(arg_params, aux_params)

Step 3: Extract Features

If we extract the internal outputs from a neural network rather than then final predicted probabilities, the neural network works as a feature extraction module to other applications.

A loaded symbol in default only returns the last layer as output. But we can get all internal layers by get_internals, which returns a new symbol outputting all internal layers. The following codes print the last 10 layer names.

In [5]:
all_layers = sym.get_internals()
all_layers.list_outputs()[-10:-1]
Out[5]:
['bn1_moving_var',
 'bn1_output',
 'relu1_output',
 'pool1_output',
 'flatten0_output',
 'fc1_weight',
 'fc1_bias',
 'fc1_output',
 'softmax_label']

Often we want to use the output before the last fully connected layers, which may return semantic features of the raw images but not too fitting to the label yet. In the ResNet case, it is the flatten layer with name flatten0 before the last full connected layer. The following codes get the new symbol sym3 which use the flatten layer as the last output layer, and initialize a new module.

In [6]:
all_layers = sym.get_internals()
sym3 = all_layers['flatten0_output']
mod3 = mx.mod.Module(symbol=sym3, context=mx.cpu())
mod3.bind(for_training=False, data_shapes=[('data', (1,3,224,224))])
mod3.set_params(arg_params, aux_params)

The following are a set of utility functions that will become handy.

In [7]:
import os
import numpy as np
import cv2
import urllib
from zipfile import ZipFile

# Helper function to read the files from a directory
def read_files_in_path(path):
    for root, dir_names, file_names in os.walk(path):
        for file_name in file_names:
            yield file_name
            
# Getting the image ready
def get_image(filename):
    img = cv2.imread(filename)  # read image in b,g,r order
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)   # change to r,g,b order
    img = cv2.resize(img, (224, 224))  # resize to 224*224 to fit model
    img = np.swapaxes(img, 0, 2)
    img = np.swapaxes(img, 1, 2)  # change to (channel, height, width)
    img = img[np.newaxis, :]  # extend to (example, channel, heigth, width)
    return img

# Download training and testing images, if not already downloaded
def download_images(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.urlretrieve(url, filename)
        zip = ZipFile(filename)
        zip.extractall()

download_images('http://josephkj.in/wp-content/uploads/2017/01/joseph_dataset.zip')

We define a input data structure which is acceptable by mxnet. The field data is used for the input data, which is a list of NDArrays.

In [8]:
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])

Here, we are extracting the features of the images in the training set. We add each feature and the corresponding label to X and y respectively.

This is where we actually get the features from the resnet model by doing a forward propagation. img is the image and out is the feature vector.

mod3.forward(Batch([mx.nd.array(img)]))
out = mod3.get_outputs()[0].asnumpy()
In [9]:
# Loading the training set
X = []
y = []

print 'Loading the training set...'
base_path = './joseph_dataset/training_set/iphone/'
for file_name in read_files_in_path(base_path):
    img = get_image(base_path + file_name)
    mod3.forward(Batch([mx.nd.array(img)]))
    out = mod3.get_outputs()[0].asnumpy()
    X.append(out[0])
    y.append(1)
    
# Get all the negative images to X
base_path = './joseph_dataset/training_set/mac/'
for file_name in read_files_in_path(base_path):
    img = get_image(base_path + file_name)
    mod3.forward(Batch([mx.nd.array(img)]))
    out = mod3.get_outputs()[0].asnumpy()
    X.append(out[0])
    y.append(0)
print 'Completed loading the training set.'
Loading the training set...
Completed loading the training set.

Doing the same for the test set

In [10]:
# Loading the testing set

X_test = []
y_test = []

print 'Loading the testing set...'
base_path = './joseph_dataset/testing_set/iphone/'
for file_name in read_files_in_path(base_path):
    img = get_image(base_path + file_name)
    mod3.forward(Batch([mx.nd.array(img)]))
    out = mod3.get_outputs()[0].asnumpy()
    X_test.append(out[0])
    y_test.append(1)
    
base_path = './joseph_dataset/testing_set/mac/'
for file_name in read_files_in_path(base_path):
    img = get_image(base_path + file_name)
    mod3.forward(Batch([mx.nd.array(img)]))
    out = mod3.get_outputs()[0].asnumpy()
    X_test.append(out[0])
    y_test.append(0)
print 'Completed loading the testing set.'
Loading the testing set...
Completed loading the testing set.

Step 4: Training an SVM to classify images

This is where we train the SVM with the features that we have extracted.

In [11]:
from sklearn import svm

model = svm.LinearSVC()
model.fit(X, y)
Out[11]:
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

Viewing the score of the SVM on the testing set.

In [12]:
score = model.score(X_test, y_test)
print "Score is: " + str(score*100) +"%"
Score is: 97.5609756098%

The predict method, that uses the SVM that we have trained to classify the input image:

In [13]:
import matplotlib.pyplot as plt

def predict (path):
    
    # Displaying image
    img_display = cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)
    plt.imshow(img_display)
    plt.axis('off')
    
    # Infering the type
    img = get_image(path)
    mod3.forward(Batch([mx.nd.array(img)]))
    out = mod3.get_outputs()[0].asnumpy()
    result = model.predict([out[0]])
    if result[0] == 1:
        plt.title('Result: There is an iPhone in the picture.')
    else:
        plt.title('Result: There is a MacBook in the picture.')
    plt.show()
        
def predict_from_url (url):
    filename = url.split("/")[-1]
    urllib.urlretrieve(url, filename)
    predict(filename)

Step 5: Using the model

In [14]:
# Example from testing set
In [15]:
predict('./joseph_dataset/testing_set/mac/new-apple-macbook-2015-_-_20.0.jpg')
In [16]:
predict('./joseph_dataset/testing_set/iphone/iPhone72.jpg')
In [17]:
# Example from URL
In [18]:
predict_from_url('http://www.grifiti.com/sites/default/files/imagecache/product_full/800x800_retro-ergo_deck_15in_06.jpg')