Hierarchical feature representation for unconstrained video analysis

Document Type

Article

Publication Date

10-21-2019

Publication Title

Neurocomputing

Volume

363

First Page

182

Keywords

Data compression, Video analysis, Visual learning

Last Page

194

Abstract

Complex video analysis is a challenging problem due to the long and sophisticated temporal structure of unconstrained videos. This paper introduces pooled-feature representation (PFR) which is derived from a double layer encoding framework (DLE) to address this problem. Considering that a complex video is composed of a sequence of simple frames, the first layer generates temporal sub-volumes from the video and represents them individually. The second layer constructs the pool of features by fusing the represented vectors from the first layer. The pool is compressed and then encoded to provide video-parts vector (VPV). This framework allows distilling the representation and extracting new information in a hierarchical way. Compared with recent video encoding approaches, VPV can preserve the higher-level information through typical encoding in the higher layer. Furthermore, the encoded vectors from both layers of DLE are fused along with a compression stage to develop PFR. The early and late fusion stages are adopted based on the priority of compression stage over concatenation of represented vectors. To validate the proposed framework, we conduct extensive experiments on four complex action datasets: UCF50, HMDB51, URADL, and Olympic. Experimental results demonstrate that PFR with early fusion achieves the state-of-the-art performance by capturing the most prominent features with minimum dimension.

DOI

10.1016/j.neucom.2019.06.097

ISSN

09252312

E-ISSN

18728286

Share

COinS