Date of Award


Publication Type

Doctoral Thesis

Degree Name



Electrical and Computer Engineering

First Advisor

Wu, Jonathan (Electrical and Computer Engineering)


Engineering, Electronics and Electrical.




This dissertation proposes a general framework to efficiently identify the objects of interest (OI) in still images and its application can be further extended to human action recognition in videos. The frameworks utilized in this research to process still images and videos are similar in architecture except they have different content representations. Initially, global level analysis is employed to extract distinctive feature sets from an input data. For the global analysis of data the bidirectional two dimensional principal component analysis (2D-PCA) is employed to preserve correlation amongst neighborhood pixels. Furthermore, to cope with the inherent limitations within the holistic approach local information is introduced into the framework. The local information of OI is identified utilizing FERNS and affine SIFT (ASIFT) approaches for spatial and temporal datasets, respectively. For supportive local information, the feature detection is followed by an effective pruning strategy to divide these features into inliers and outliers. A cluster of inliers represents local features which exhibit stable behavior and geometric consistency. Incremental learning is a significant but often overlooked problem in action recognition. The final part of this dissertation proposes a new action recognition algorithm based on sequential learning and adaptive representation of the human body using Pyramid of Histogram of Oriented Gradients (PHOG) features. The changing shape and appearance of human body parts is tracked based on the weak appearance constancy assumption. The constantly changing shape of an OI is maximally covered by the small blocks to approximate the body contour of a segmented foreground object. In addition, the analytically determined learning phase guarantees lower computational burden for classification. The utilization of a minimum number of video frames in a causal way to recognize an action is also explored in this dissertation. The use of PHOG features adaptively extracted from individual frames allows the recognition of an incoming action video using a small group of frames which eliminates the need of large look-ahead.