SRI Logo
Spacer
    

Spacer
         
  SRI Logo


Super-Resolution

To improve the recognition of small text, SRI developed a new super-resolution technique that incorporates a constraint based on the fact that a region of text is generally bilevel. Super-resolution techniques use models of the image formation process to combine multiple images into a summary image with higher spatial resolution than any one of the original images. This process is applicable to imagery containing all types of real scenes. However, we have been able to show for imagery of text that incorporating this constraint in the super-resolution computation directly yields better results than the sequential two-step process of computing the super-resolved result and then performing binarization. The algorithm was implemented in C++. Because our goal was better OCR performance and not merely a better-looking image, we quantitatively evaluated the results by running an OCR engine on the super-resolved result. To assess performance under realistic imaging conditions, we evaluated the algorithm with video data taken with a common consumer-grade camera [Donaldson and Myers 2003]. We summarize the results in this section.

Figure 1 shows a full frame extracted from a video showing text of various sizes, from the Gettysburg address.

gettys
Figure 1. Full Video Frame Showing Text from Gettysburg Address

Table 1 shows the results of a test in which we subjected as many as 32 input images of various-size text to our super-resolution process.

table1
Table 1. Output of the super-resolution algorithm for text heights of 3.75 to 9 pixels, as measured by the height of the capital letters along with the first frame of the image sequence used to generate the result

Figure 2 shows OCR accuracy (in terms of the fraction of characters correctly recognized) as a function of text height for various numbers of low-resolution frames. Under low blur, noise, and distortion conditions, the benefits of our algorithm become apparent when the heights of the capital letters of the observed text are between 3.75 and 6.75 pixels. Benefits disappear at letter heights of about 9 pixels, since at that height the OCR system becomes able to recognize the text perfectly by using only bicubic interpolation and binarization as preprocessing. The curves show the results for 4 to 32 input images, with an additional curve showing the OCR accuracy of the images after expansion via bicubic interpolation.

accuracy
Figure 2. OCR Accuracy vs. Text Height Achieved with the Super-Resolution Algorithm

When the Gettysburg Address video sequence is compressed by an MPEG-1 scheme, the resolution is halved to 320 x 240. The 5.25-pixel-high text is thus reduced to 2.6 pixels. Figure 3 demonstrates the effectiveness of the algorithm on the compressed video data. Initially, one would assume that text in the MPEG-1 data would be harder to recognize because of the compression artifacts, but downsizing the data reduces the effective blur (measured in pixels) and the CCD noise, so the algorithm is actually able to recognize smaller text in the MPEG-1 data than in the DV data, as shown in Figure 3.

pixel height
Figure 3. OCR Accuracy vs. Text Height of MPEG-1-compressed Gettysburg Text

Table 2 shows the results of another test case with MPEG-1 compression. The text was filmed through the window of a moving train. This case is a good example of a "dirty"image: as the street sign moves across the frame of the camera it is distorted and scaled by a small amount. Both transformations violate the assumptions of our 2-D model of translational camera movement. Even under these conditions, however, the algorithm significantly improves the OCR results; in addition, this example shows that the algorithm performs better with the bimodal constraint than it does without it.

()
Table 2. Results of Testing the Algorithm on MPEG-1 Compressed Imagery, along with an Original Frame and a Bicubic Interpolated Frame

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2011 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy