Date of Award

2011

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Computer Science.

Supervisor

Lu, Jianguo (School of Computer Science)

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

Data of deep web in general is stored in a database that is only accessible via web query forms or through web service interfaces. One challenge of deep web crawling is how to select meaningful queries to acquire data. There is substantial research on the selection of queries, such as the approach based on the set covering problem where greedy algorithm or its variation is used. These methods are not extensively studied in the context of real web services, which may impose new challenges for deep web crawling. This thesis studies several query selection methods on Microsoft’s Bing web service, especially the impact of the ranking of the returns in real data sources. Our results show that for unranked data sources, weighted method performed a little better then un-weighted set covering algorithm. For ranked data sources, document frequent estimation is necessary to harvest data more efficiently.

Recommended Citation

Fu, Chong, "Downloading Deep Web Data from Real Web Services" (2011). Electronic Theses and Dissertations. 322.
https://scholar.uwindsor.ca/etd/322

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Downloading Deep Web Data from Real Web Services

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Downloading Deep Web Data from Real Web Services

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner