Date of Award

2016

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Supervisor

Chen, Jessica

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

Deep web crawling refers to the process of collecting documents that have been organized into a data source and can only be retrieved via a search interface. This is often achieved by sending different queries to the search interface. Dealing with the difficulty in selecting suitable set of queries, this crawling process can be implemented with stepwise refinement: documents are retrieved step by step, while in each step, we adapt the query selection to our accumulated knowledge obtained from the documents downloaded in the previous steps. However, it takes much of our time and effort to download the documents and learn from the resulting sample in order to improve the query selection. Here we propose a cost-effective, data-driven method for stepping the adaptive crawling of the deep web. Through empirical study, we explore the criteria in setting the lengths of the steps to best balance the trade-off between the sample updating cost and the improved quality of the selected queries. Derived from four existing data sets typically used for deep web crawling, such criteria provide practical guidelines for cost-effective stepwise refinement in iterative document retrieval.

Recommended Citation

Sun, Xu, "Practical Guides for Data Retrieval in Deep Web Crawling" (2016). Electronic Theses and Dissertations. 5869.
https://scholar.uwindsor.ca/etd/5869

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Practical Guides for Data Retrieval in Deep Web Crawling

Date of Award

Publication Type

Degree Name

Department

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Practical Guides for Data Retrieval in Deep Web Crawling

Author

Date of Award

Publication Type

Degree Name

Department

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner