Hierarchical feature representation for unconstrained video analysis
Document Type
Article
Publication Date
10-21-2019
Publication Title
Neurocomputing
Volume
363
First Page
182
Keywords
Data compression, Video analysis, Visual learning
Last Page
194
Abstract
Complex video analysis is a challenging problem due to the long and sophisticated temporal structure of unconstrained videos. This paper introduces pooled-feature representation (PFR) which is derived from a double layer encoding framework (DLE) to address this problem. Considering that a complex video is composed of a sequence of simple frames, the first layer generates temporal sub-volumes from the video and represents them individually. The second layer constructs the pool of features by fusing the represented vectors from the first layer. The pool is compressed and then encoded to provide video-parts vector (VPV). This framework allows distilling the representation and extracting new information in a hierarchical way. Compared with recent video encoding approaches, VPV can preserve the higher-level information through typical encoding in the higher layer. Furthermore, the encoded vectors from both layers of DLE are fused along with a compression stage to develop PFR. The early and late fusion stages are adopted based on the priority of compression stage over concatenation of represented vectors. To validate the proposed framework, we conduct extensive experiments on four complex action datasets: UCF50, HMDB51, URADL, and Olympic. Experimental results demonstrate that PFR with early fusion achieves the state-of-the-art performance by capturing the most prominent features with minimum dimension.
DOI
10.1016/j.neucom.2019.06.097
ISSN
09252312
E-ISSN
18728286
Recommended Citation
Mohammadi, Eman; Jonathan Wu, Q. M.; Saif, Mehrdad; and Yang, Yimin. (2019). Hierarchical feature representation for unconstrained video analysis. Neurocomputing, 363, 182-194.
https://scholar.uwindsor.ca/electricalengpub/254