Date of Award
2011
Publication Type
Master Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
Computer Science.
Supervisor
Lu, Jianguo (School of Computer Science)
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
Data of deep web in general is stored in a database that is only accessible via web query forms or through web service interfaces. One challenge of deep web crawling is how to select meaningful queries to acquire data. There is substantial research on the selection of queries, such as the approach based on the set covering problem where greedy algorithm or its variation is used. These methods are not extensively studied in the context of real web services, which may impose new challenges for deep web crawling. This thesis studies several query selection methods on Microsoft’s Bing web service, especially the impact of the ranking of the returns in real data sources. Our results show that for unranked data sources, weighted method performed a little better then un-weighted set covering algorithm. For ranked data sources, document frequent estimation is necessary to harvest data more efficiently.
Recommended Citation
Fu, Chong, "Downloading Deep Web Data from Real Web Services" (2011). Electronic Theses and Dissertations. 322.
https://scholar.uwindsor.ca/etd/322