Date of Award

10-4-2023

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Conditional Transformers;Information Retrieval;Personalized Query Reformulation

Supervisor

Hossein Fani

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

The foremost means of information retrieval, search engines, have difficulty searching into knowledge repositories, e.g., the web, because they are not tailored to the users' differing information needs. User queries are, more often than not, under-specified or contain ambiguous terms that also retrieve irrelevant documents. Query refinement is the process of transforming users' queries into new refined versions without semantic drift to enhance the relevance of search results. Prior query refiners have been benchmarked on ad-hoc web retrieval datasets following weak assumptions that users' input queries improve gradually within a search session. Existing methods also have employed additional metadata, such as session history or users' click-throughs, to enrich the query context. However, one crucial contextual cue has been overlooked: the user context. Moreover, personalized query refinement is vastly unexplored with the recent advancements in transformers and large language models in general. To overcome the aforementioned problems, (i) We contribute RePair, an open-source configurable toolkit, to generate large-scale gold standard benchmark datasets from a variety of domains for the task of query refinement. RePair takes a dataset of queries and their relevance judgements (e.g., msmarco or aol), a sparse or dense information retrieval method (e.g., bm25, colbert), and an evaluation metric (e.g., map), and outputs refined versions of queries, each of which with the relevance improvement guarantees under the retrieval method in terms of the evaluation metric. RePair benefits text-to-text-transfer-transformer (t5) to generate gold standard datasets for any input query set and is designed with extensibility in mind. Out of the box, we have generated and publicly shared gold-standard datasets for aol and msmarco.passage whilst benchmarking these gold standard datasets with state-of-the-art supervised query suggestions models and exploring t5 as an alternative model for query suggestion. (ii) We propose leveraging t5 to incorporate user context by adding a user-tailored pretext to the input sequence as prior conditions to generate personalized reformulation of queries in the output sequence. Our experiments on the aol query log demonstrated the effectiveness of t5 in personalized query reformulation without any loss of generality to other conditional transformers. Our codebase is publicly available at https://github.com/fani-lab/RePair.

Share

COinS