Date of Award

9-12-2024

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Backtranslation;Natural Language Processing;Query Refinement;Query Reformulation;Rag Fusion

Supervisor

Hossein Fani

Abstract

Query refinement is to enhance the relevance of search results by modifying users' original queries to refined versions. State-of-the-art query refinement models have been trained on web query logs, which are predisposed to topic drifts. To fill the gap, little work has been proposed to generate benchmark datasets of (query  refined query) pairs through an overwhelming application of unsupervised or supervised modifications to the original query while controlling topic drifts. In this paper, however, we propose leveraging natural language backtranslation, a round-trip translation of a query from a source language via target languages, as a simple yet effective unsupervised approach to scale up generating gold-standard benchmark datasets. Backtranslation can (1) uncover terms that are omitted in a query for being commonly understood in a source language, but may not be known in a target language (e.g., figs  (tamil) அத்திமரங்கள்  the fig trees), (2) augment a query with context-aware synonyms in a target language (e.g., italian nobel prize winners  (farsi) برنده های ایتالیایی جایزه نوبل  italian nobel laureates), and (3) help with the semantic disambiguation of polysemous terms and collocations (e.g., custer's last stand  (malay) pertahan terakhir custer  custer's last defence). Our experiments across 5 query sets with different query lengths and topics and 10 languages from 7 language families using 2 neural machine translators validated the effectiveness of query backtranslation in generating a more extensive gold-standard dataset for query refinement. We open-sourced our research at https://github.com/fani-lab/RePair/tree/nqlb.

Share

COinS