SRI International (SRI) has developed ConTEXTract,
a text recognition technology that can find and read text (such as street
signs, name tags, and billboards) in real scenes. This optical character
recognition (OCR) for text within imagery and video requires a more specialized
approach than is provided by off-the-shelf OCR software, which is
designed primarily for recognizing text within documents. ConTEXTract distinguishes
lines of text from other contents in the imagery, processes the lines, and
then sends them to an OCR submodule, which recognizes the text. Any OCR engine
can be integrated into ConTEXTract with minor modifications.
ConTEXTract is a configurable, real-time, end-to-end solution
that provides a scalable architectural framework adaptable to a wide
variety of imagery and processing environments. Unique capabilities developed
by SRI for ConTEXTract include the following:
The ability to distinguish text from non-text in video
imagery
The ability to extract text from complex, patterned, and/or
colored backgrounds
Character recognition despite oblique viewing angles, low
resolution, and other distortions
High accuracy by combining text information that appears over multiple video frames
This development is an outgrowth of several activities over
the years at SRI. SRI had an R&D program with the United States Postal Service (USPS) for over 10 years to read
addresses on mail pieces, on which text is
often located amid
graphics and printed on complex patterned
backgrounds. Recently SRI conducted government-sponsored research aimed at extracting
information
(including text captions and scene text) from large volumes of video imagery,
including
broadcast news.
SRI's ConTEXTract software has been licensed to
Virage, Inc., a leading vendor of content-based
video indexing and information extraction systems, for incorporation into their
video cataloging
product as a plug-in module.
SRI was funded by ARDA (a predecessor to IARPA) under the Video Analysis and Content Extraction (VACE) program to extend our video text recognition techniques to read scene text written on any planar surface, even if viewed at an oblique angle. For further information on SRI's work in this program, please visit our VACE Project Page, "Recognition
of 3-D Scene Text."
In 2000, SRI implemented ConTEXTract in a wearable system. A user aims a video
or still camera at some
text in the scene. The video imagery is captured and processed on a wearable
PC that runs
the ConTEXTract software. The recognized text is displayed on a flat-panel
display mounted on the
users forearm. SRI envisions a variety of potential applications in digital
still cameras and portable
electronic devices.
Evaluation of SRI's
Video Text Recognition Process