Date of Award


Publication Type

Master Thesis

Degree Name



Computer Science

First Advisor

Wu, Dan


Image Retrieval, Informational Retrieval, IR, TBIR, Text-based Image Retrieval




With advances in computer technology, there has been an explosion in the amount of digital images being generated. It is of importance to retrieve images accurately and e ciently. Text-based Image Retrieval (TBIR) methods are popular and practical in extensive applications and have been developed in the past decades. Since the process of TBIR is similar to Information Retrieval (IR), di erent techniques were adopted from IR and utilized to improve the performance of TBIR methods. In this thesis, we focus on three IR techniques which are Term Frequency - Inverse Document Frequency (TF-IDF), Vector Space Model (VSM) and Cosine Coe cient Similarity (CCS) measure. These three techniques have been utilized in TBIR methods together and separately and can e ectively improve the performance of TBIR methods. However, to the best of our knowledge, the TBIR methods that utilized the three techniques together are hybrid approaches, only the performance of Contentbased Image Retrieval (CBIR) and TBIR hybrid methods are evaluated by the authors. Consequently, the e ectiveness of applying these three IR techniques to TBIR methods is investigated by comparing the retrieval results of an experimental TBIR system in 2 di erent modes: one is the system implemented with only TF-IDF technique (Mode 2) and the other one with all three techniques (Mode 1). Based on the experiment results, the performance of the experimental TBIR system implemented with the three IR techniques is relatively ideal. In most cases, the average precision is above 80% on the IAPR TC-12 image database. Moreover, we also investigate how the repeated index terms a ect the performance of TBIR methods by comparing the top 5 retrieved images' rankings generated by the 2 modes of the experimental TBIR system. According to the experiment results, we found that adding VSM and CCS measure to the experimental TBIR system that is only implemented with TF-IDF technique could improve its performance in terms of ranking accuracy in most cases when images' annotations contain repeated index terms that match the query.