Real-Time Multi-Digit Recognition

Recognizing multi-digit numbers from images of the real world is a significantly more difficult problem than Optical Character Recognition (OCR) of scanned documents due to the complexity of such imagery.

A convolutional neural network was implemented using Tensorflow to perform the end-to-end task of simultaneously recognizing and localizing all the digits of a number.

The model was trained on the Street View House Numbers (SVHN) dataset which consists of images collected from Google's street view of house numbers. The model attained the following evaluation metrics on a test set from the SVHN dataset:

  • 95.22% accuracy on the task of recognizing the entire digit sequences correctly
  • 98.63% accuracy on recognizing individual digits correctly.
  • 0.83 Index of Union score for the digit bounding box predictions.

A desktop application was created using this trained model, that recognizes and localizes the digits of multi-digit numbers from a video feed (e.g. from a webcam) in real time (even on a modest laptop without using GPU).

Below is a video that demonstrates the application running on images fed in through the webcam of a laptop in real time.

The full report, and source code can be accessed using the following links.

Credits