Date of Award
Electrical and Computer Engineering
Engineering, Electronics and Electrical.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
OCR (Optical Character Recognition) has been confronted with the problems of recognizing degraded document images such as text overlapping with non-text symbols, touching characters, etc. The recognition rate for those degraded document images will become unacceptable or completely fail if pre-processing algorithms are not performed before segmentation recognition algorithms are applied. Therefore, the principle objective of this thesis is to develop effective algorithms for tackling those problems in the field of document analysis. We focus our efforts only on the following aspects: 1. A morphological approach has been developed to extract text strings from regular periodic overlapping text/background images, since most OCR systems can only read traditional characters: black characters on a uniform white background, or vice versa. The proposed algorithms that perform text character extraction accommodate document images that contain various kinds of periodically distributed background symbols. The underlying strategy of the algorithms is to maximize background component removal while minimizing the shape distortion of text characters by using appropriate morphological operations. 2. Real-world images, which are frequently degraded due to human induced interference strokes, are inadequate for processing by document analysis systems. In order to process those document images, containing handwritten interference marks which do not possess the periodical property, a new algorithm combining a thinning technique and orientation attributes of connected components has been developed to effectively segment handwritten interference strokes. Morphological operations based on orientation map and skeleton images are used to successfully prevent the "flooding water" effect of conventional morphological operations for removing interference strokes. 3. Segmenting a word into its character components is one of the most critical steps in document recognition systems. Any failures and errors in this segmentation step can lead to a critical loss of information from documents. In this thesis, we propose new algorithms for resolving the ambiguities in segmenting touching characters. A modified segmentation discrimination function is presented for segmenting touching characters based on the pixel projection and profile projection. A dynamic recursive segmentation algorithm has been developed to effectively search for correct cutting points in touching character components. Based on 12 pages of "NEWSLINE", the University of Windsor's publication, a 99.6% character recognition accuracy has been achieved.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1996 .L52. Source: Dissertation Abstracts International, Volume: 59-08, Section: B, page: 4336. Advisers: M. Ahmadi; M. Shridhar. Thesis (Ph.D.)--University of Windsor (Canada), 1996.
Liang, Su., "Restoration and segmentation of machine printed documents." (1996). Electronic Theses and Dissertations. 3344.