International Journal of Innovative Engineering Research (E-ISSN: 2349-882X) Vol 7, Issue 2, March 2017
Detection and Recognition of Text from Image using Contrast and Edge Enhanced MSER Segmentation and OCR Asit Kumar
Sumit Gupta
M.Tech Scholar Department of Electronic & Communication OIST, Bhopal
[email protected]
Professor Department of Electronic & Communication OIST, Bhopal
[email protected]
Abstract— Text detection and recognition in traffic scene images or natural images has applications in computer vision systems like registration number plate detection, automatic traffic sign detection, image retrieval and help for visually impaired people. Scene text, however, has complicated background, blur image, partly occluded text, variations in font-styles, image noise and ranging illumination. Hence scene text recognition could be a difficult computer vision problem. This work addresses the matter of dictionary driven end-to-end scene text recognition, which is divided into a text detection drawback and a text recognition drawback. For such reasons, an enhanced algorithm is proposed in which image is preprocessed before detection phase. This is done using noise removal technique, i.e. LucyRichardson algorithm. After noise removal, text region detection phase starts with contrast enhanced edge enhanced MSER region detection technique is used there after morphological segmentation is used to segment text region in the image. After detection phase recognition phase starts in which text candidates are filtered using geometric filtration using properties such as aspect ratio, eccentricity, solidicity, etc. Then Bounding box technique is used to identify letter candidates and form word out of them. Finally, Optical Character Recognition (OCR) tool is used to extract text out of image. The system presented outperforms state of the art methods on the dataset of the traffic text sign data that were obtained from Jaguar Land Rover Research. The results, in the text-detection phase (precision = 0.96 and Fmeasure = 0.93) and text recognition/extraction phase (precision = 0.94 and Fmeasure = 0.95) tasks, shows the enhanced result as compared to the existing techniques.
the variation in fonts, backgrounds, lighting conditions and textures occurring in these images. Also, the location of text in natural scenes is highly randomized. Detecting the embedded text information in natural scene images is of great significance for image understanding, content-based image retrieval and navigation.On account of these challenges, text detection in unconstrained images is considered as far from solved [2-4, 8]. Methods for scene text localization and recognition aim to find all areas in an image that would be considered as text by human, mark boundaries of the areas and output a sequence of characters associated with its content. They allow for real-world images processing and extracting the content of every detected text area into a digital text layout that can be additionally executed by a computer. Scene text localization and recognition (also known as text localization and recognition in real-world images, nature scene OCR or text-in-the-wild problem) is an open problem, unlike printed document recognition where state-of-the-art systems are able to recognize correctly more than 99% of characters. Factors contributing to the complexity of the problem include: nonuniform background, the need for compensation of perspective effects (for documents, rotation or rotation and scaling is sufficient); real-world texts are often short snippets written in different fonts and languages; text alignment does not follow strict rules of printed documents; many words are proper names which prevents an effective use of a dictionary [3, 9]. Applications of text localization and recognition in real-world images range from automatic annotation of image databases based on their textual content (e.g. Flickr or Google Images), assisting the visually impaired to reading labels on businesses in map applications (e.g. Google Street View). Text detection and recognition in traffic images is, however, a challenging, unsolved computer vision problem. Scene text has complex background, image blur, partially occluded text, variations in font-styles changeable illumination and image noise as illustrated in Figure 1. Commercial systems do not work in these setting.
Keywords— Text Detection, Lucy-Richardson algorithm, Contrast and Edge Enhanced Maximally stable extremal region (MSER), Traffic Text Recognition, Morphological Segmentation, Bounding Box, Optical Character Recognition
I. INTRODUCTION Visual text is one in all the foremost necessary strategies of communication utilized by human beings and is wide utilized in our everyday life. Hence, interpreting this textual data is of great significance. Human beings inherently get the ability to find and acknowledge the textual content in their surroundings whereas it's a challenging problem for computer systems. The field of text detection focuses on detecting text embedded in images and videos with the help of computer systems[1]. Researchers have made significant progress in detecting the text from images of machine printed documents; on the other hand detecting the text in natural scenes is still a new topic for research. Text detection in natural scenes is difficult because of
1
International Journal of Innovative Engineering Research (E-ISSN: 2349-882X) Vol 7, Issue 2, March 2017 Neumann et al.[7] propose ERs for segmenting regions. ERs are extracted on the gradient images, HSI and RGB to recover regions for character candidate. As an alternative of using heuristics as Epshtein et al. [3] for labeling text, an AdaBoost classifier based on geometric features is used. Text-CCs are then grouped to words. In [11] Zheng et al. proposed a completely unique image operator is projected to observe and find text in scene images. to attain a high recall of character detection, extremal regions are detected as character candidates. 2 classifiers are trained to spot characters, and a algorithmic native search algorithm is projected to extract characters that are incorrectly known by Figure 1 Traffic Image Containing Text the classifiers. An efficient pruning technique, which mixes This paper is arranged as follows. Section II reviews related component trees and recognition results, is projected to prune work of text detection and recognition process while section III continuation elements. A cascaded technique combines text describes the proposed methodology. Section IV gives result line entropy with a Convolutional Neural Network model. It's analysis of proposed methodology. Section V concludes with wont to verify text candidates that reduce the quantity of nonthe performance of proposed methodology as well as shows the text regions. The projected technique is taking a look at on 3 enhancement that could be possible in future works. public datasets, i.e. ICDAR2011 dataset, ICDAR2013 dataset II. RELATED WORK and ICDAR2015 dataset. Intelligent decision support system (IDSS) has been proposed Wang et al. [6] propose HOG features with a Random Ferns which analyzes online news before its publication. It predicts if classifier to detect and classify text in an end-to-end setting. an article will become popular. Online news’ popularity is The multiclass-detector is trained on letters. Non-maxima of measured by considering communication between web and the detector results are concealed. The remaining letters are social networks with factors like number of shares, likes and then combined in a Pictorial Structure framework, where letters comments. The popularity of candidate articles is first are parts of words. For each word in a dictionary, the most estimated and changes in unpopular news are suggested in plausible character responses are found in the image. Detected optimization module [1]. words are then rescored based on geometric information and Existing text-detection methods can be divided into region non-maxima suppression is done to remove overlapping wordbased and texture based methods. Region based methods rely responses. on image segmentation. Pixels are grouped to CCs which are In [10] Greenhalgh et al. proposed a unique system for the character candidates. These candidates are further grouped to automated detection and recognition of text in traffic signs. candidate words and textlines based on geometric features. Scene structure is employed to describe search regions at Texture based methods distinguish text from non-text based on intervals the image, surrounded by traffic sign candidates are local features and machine learning techniques. then found. Maximally Stable Extremal Regions (MSERs) and Fischler et al. planned a replacement model known as Random hue, saturation, and worth color thresholding are used to locate Sample consensus (RANSAC) [1]. It is a fitting model. It's a large range of candidates, that are then reduced by applying capable of smoothing knowledge that contain a major constraints supported temporal and structural data. A proportion of errors. Here a set of the original data is taken recognition stage interprets the text contained at intervals first. this is often known as hypothetic inliners. Then a model is detected candidate regions. Individual text characters are fitted into this hypothetic inliners. The remaining knowledge detected as MSERs and are classified into lines, before being are tested during this model. The point that matches these interpreted using optical character recognition (OCR). models are a region of the consensus data set. This is often Recognition accuracy is immensely improved through the however RANSAC algorithmic rule is functioning. temporal fusion of text results across consecutive frames. Chen et al. [4] propose a text detection method using MSER. III. PROPOSED METHODOLOGY The outlines of MSER are improved by edges detection A novel Connecting Character based text recognition and techniques such as canny edge detection. This makes MSER extraction algorithm is designed which uses Maximally Stable less responsive to blur images. Based on geometric cues these Extremely Regions (MSER) for test candidate recognition and candidate character regions are then grouped to words and extraction from traffic signs. Despite their auspicious textlines.
2
International Journal of Innovative Engineering Research (E-ISSN: 2349-882X) Vol 7, Issue 2, March 2017 properties, MSER has been conveyed to be converted into gray scale Image. The rgb2gray method delicate towards blurred Image. To allow for detecting small function transforms RGB images to grayscale by removing the letters in images of limited resolution or blurred Image, the information of hue and saturation at the same time as retaining complimentary properties of Lucy-Richardson Algorithm and the luminance. canny edge Algorithm is used. Further geometric filtering and Step 3: Edge Enhancement pairing is applied to efficiently obtain more reliable results. In this step, Canny edge detection algorithm is used for image Finally, texts are clustered into lines and additional checks are edge detection. performed to eliminate false positives. The proposed algorithm, illustrated in Figure 2, is divided into two basic steps i.e. text Text Area area detection and text recognition. The overall flow of the proposed algorithm is divided into two stages i.e. Text Recognition and Text Extraction as described below. The detection stage exploits knowledge of the structure of the scene, i.e., the size and location of the road in the frame, to determine the regions in the scene that it should search for traffic text signs Once a potential traffic sign has been located, the next stage of the algorithm attempts to recognize text within the region. In this step firstly load the traffic image in which we have to detect text. Before preceding towards next step first of all the algorithm crop that portion of image that contains text and further the text can be rotated in plane, if required. Step 1: Noise Removal and De-blurring Image Due to imperfections in the image capturing procedure, on the other hand, the recorded image invariably represents a corrupted version of the original image. The degradation results in blurring of image, which affects identification and retrieval of the essential information in the image frames. It can be the result of relative motion between the camera and the original image frame, by an out of focus of optical system, atmospheric disturbances and deviation in the optical system. Noise introduced by the medium through which the image is created can also cause degradation. The degradation phenomenon of the acquired images results severe costeffective loss. Consequently, restoring the corrupted images is an urgent task in order to expand uses of the images. In this step the proposed algorithm uses Lucy-Richardson Algorithm is used for noise removal and de-blurring the blurred image. Step 2: Contrast Adjustment and Conversion RGB image to Binary Image Image enhancement techniques are used to improve an image, where "improve" is sometimes defined objectively (e.g., increase the signal-to-noise ratio), and sometimes subjectively (e.g., formulate definite features easier to see by modifying the colors or intensity value). Intensity adjustment is an image enhancement technique that maps an image's intensity values to a new range. In this step, contrast or brightness level of the input image is enhanced. Further in this step RGB Image is
Detection Phase Input Image
Text Recognition Phase Geometric Filtering
Noise Removal De-blurring Image Morphological Segmentation Contrast Enhancement & Binary Conversion
Edge Detection and Enhancement
Character Connecting
Text line Formation
Word Separation MSER region detection
Figure 2 Flow chart of proposed Algorithm
Step 4: MSER region detection Since the intensity distinction of text to its background is often important and an even intensity or color inside each text are often assumed, MSER could be a natural choice for text detection. Step 5: Morphological Segmentation After detection of edge enhanced MSER region, text recognition phase is started. In text recognition phase first of all we perform the morphological segmentation over edge enhanced MSER region.
3
International Journal of Innovative Engineering Research (E-ISSN: 2349-882X) Vol 7, Issue 2, March 2017 Step 6: Geometric Filtering and Character Connecting With the extraction of segmented region, geometric filtration phase starts. In this phase segmented MSER region is then filtered on the basis of features such as aspect ratio, eccentricity, solidicity, etc. And thus filtered candidates are further connected using bounding box technique. Step 7: Text line formation and Word separation Subsequent in this stage of the algorithm locates lines of text among the detected candidate regions. This allows the total number of CCs to be reduced, removing non-character CCs and therefore raising the probabilities for higher accuracy. As a final step, text lines are split into individual words by classifying, by OCR, the inter letter distances into two classes: the character spacing and the word spacing. IV.
Table 2 comparative Analysis for Text Detection Stage METHOD
PRECISION
RECALL
F_MEASURE
Reina et al. [12]
0.54
0.64
0.61
Gonzalez et al. [13]
0.54
0.68
0.60
Greenhalgh et al. [10]
0.96
0.90
0.93
Proposed method
0.94
0.98
0.95
Graph 1 Comparative analysis of different text detection techniques
EXPERIMENTAL PARAMETERS AND RESULT ANALYSIS
In order to evaluate the performance of proposed algorithm scheme, the proposed algorithm is simulated in following configuration: Pentium Core I5-2430M CPU @ 2.40 GHz 4GB RAM 64-bit Operating System MATLAB Platform Image Processing Toolbox Computer Vision Toolbox The traffic text sign data that is used in proposed work were obtained from Jaguar Land Rover Research, and these are available to other researchers at http://www.bris.ac.uk/vilab/projects/roadsign/index.html. These data were captured with a camera, for which the full calibration parameters. To evaluate the performance of the proposed system following parameters such as Precision, Recall and Fmeasure are used. Precision = TP/ (TP+FP) Recall = TP/(TP+FN) Fmeasure = 2* (Precision* Recall) / (Precision + Recall) Where, True Positive (TP) = Correctly detected text in image False Positive (FP) = Text incorrectly identified in images False Negatives (FN) = Text that are failed to be detected in image. The results of detected phase and recognition phase of proposed methodology are evaluated on 8 different image frames and the result analysis of these frames are illustrated in Table 1. To evaluate the performance of the detection and recognition stage on the parameters such as Precision, Recall, and Fmeasure are compared with existing algorithms are illustrated in Table 2 and 3 as well as in Graph 1 and 2.
Table 3 Comparative Analysis for Text Recognition Stage METHOD Standard Tesseract OCR OCR with shape correction OCR with temporal fusion Proposed method
PRECISION
RECALL
F_MEASURE
0.48
0.34
0.4
0.69
0.75
0.72
0.87
0.91
0.89
0.96
0.931
0.93
Graph 2 Comparative analysis of different text recognition techniques
4
International Journal of Innovative Engineering Research (E-ISSN: 2349-882X) Vol 7, Issue 2, March 2017
Table 1 Result analysis of proposed method using different image frames
Frame No.
Input Image
Contrast-Edge-Enhanced MSER Region
Extracted Text
0
Cargo 8. Aviation Services
1
LEFT TURN ON GREEN ARROW ONLY
2
Commercial Districts
3
CUVMBERELAAAND HWY Liverpool Canberra
281
Manchester city centre. Didsbury - Q I
282
Ring Rd (W 8. N) Liverpool (M 62) Bolton (M 61) Leeds (M 62) 60
513
Ring Road All other traffic
514
Keresley Radford B4098
V.
CONCLUSION
Text extraction from natural scene images is a challenging problem due to the variations in color, font size, text
alignment, illumination etc. And it is a technique to identify and isolate the desired text from the images. The proposed algorithm represented a new methodology for text detection
5
International Journal of Innovative Engineering Research (E-ISSN: 2349-882X) Vol 7, Issue 2, March 2017 from traffic image by introducing contrast [4] H. Chen, S. S. Tsai, G. Schroth, D. M. Chen, R. Grzeszczuk, enhanced edge enhanced MSER region based text detection and B. Girod., “Robust Text Detection in Natural Images with and recognition system. The existing approaches deal with the Edge-Enhanced Maximally Stable Extremal Regions”, In same are lacking in accuracy. The proposed contrast enhanced International Conference on Image Processing, pages 2609– edge enhanced Maximally Stable Extremal Region (EMSER) 2612. IEEE, 2011. algorithm works with morphological segmentation to identify [5] Yi-Feng Pan, Xinwen Hou, and Cheng-Lin Liu, “A Hybrid the shape of the text objects. After morphological operation Approach to Detect and Localize Texts in Natural Scene geometric filtrations based on eccentricity, aspect ratio, Images”, IEEE Transactions on Image Processing, 20(3):800– solidicity, etc is performed this will find connected component 813, 2011. [6] K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text more accurately. Finally, the proposed method is integrated recognition”, In Proceedings of the IEEE International with bounding box technique to combine connected Conference on Computer Vision, pages 1457–1464, Barcelona, components into words and extract text using OCR. The Spain, 2011 newly proposed method not only reported higher precision, [7] L. Neumann and J. Matas, “Real-Time Scene Text Localization Recall and Fmeasure values but also reduced the execution time. and Recognition”, In Proceedings of the IEEE Conference on Both the detection and recognition stages of the system were Computer Vision and Pattern Recognition, pages 3538–3545. validated through comparative analysis, achieving the IEEE Computer Society, 2012. Fmeasure of 0.93 for detection and 0.95 for recognition. [8] J. Greenhalgh and M. Mirmehdi, “Real-time detection and Moreover, we still have to study and understand how the recognition of road traffic signs,” IEEE Trans. Intell. Transp. Syst., vol. 13, no. 4, pp. 1498– 1506, Dec. 2012. tracking information can help build a better user interface for [9] A. González, L. Bergasa, and J. Yebes, “Text detection and assistive devices. Observing the patterns of movement and recognition on traffic panels from street-level imagery using context in the surroundings is crucial for deciding when and visual appearance,” IEEE Trans. Intell. Transp. Syst., vol. 15, how to read text back to the user, enabling a more useful no. 1, pp. 228–238, Feb. 2014. interaction experience. We plan to develop these ideas as part [10] Jack Greenhalgh and Majid Mirmehdi, “Recognizing Textof our future work. Based Traffic Signs”, IEEE Transactions on Intelligent REFERENCES [1] M. Fischler and R. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, Jun. 1981. [2] X. Chen and A. L. Yuille, “Detecting and Reading Text in Natural Scenes”, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition., volume 2, pages 366–373, 2004. [3] B. Epshtein, E. Ofek, and Y.Wexler. Detecting Text in Natural Scenes with Stroke Width Transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2963–2970. IEEE, 2010.
Transportation Systems, vol. 16, no. 3, pp 1360-1369, June 2015. [11] Yang Zheng, Qing Lia, Jie Liu, Heping Liua, Gen Lib, Shuwu Zhang, “A cascaded method for text detection in natural scene images”, Elsevier 2017. [12] A. Reina, R. Sastre, S. Arroyo, and P. Jiménez, “Adaptive traffic road sign panels text extraction,” in Proc. WSEAS ICSPRA, 2006, pp. 295–300. [13] A. Gonzalez, L. M. Bergasa, J. J. Yebes, and J. Almazan, “Text recognition on traffic panels from street-level imagery,” in Proc. IVS, Jun. 2012, pp. 340–345.
6