Date of Award

2009

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Computer Science.

Supervisor

Lu, Jianguo (School of Computer Science)

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

The deep web is a part of the web that can only be accessed via query interfaces. Discovering the size of a deep web data source has been an important and challenging problem ever since the web emerged. The size plays an important role in crawling and extracting a deep web data source. The thesis proposes a new estimation method based on coverage to estimate the size. This method relies on the construction of a query pool that can cover most of the data source. We propose two approaches to constructing a query pool so that document frequency variance is small and most of the documents can be covered. Our experiments on four data collections show that using a query pool built from a sample of the collection will result in lower bias and variance. We compared the new method with three existing methods based on the corpora collected by us.

Recommended Citation

Liang, Jie, "Discovering the size of a deep web data source by coverage" (2009). Electronic Theses and Dissertations. 326.
https://scholar.uwindsor.ca/etd/326

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Discovering the size of a deep web data source by coverage

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Discovering the size of a deep web data source by coverage

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner