Computer Science Publications

Modeling web documents as objects for automatic web content extraction: Object-oriented web data model

E. Annoni
C. I. Ezeife, University of WindsorFollow

Document Type

Conference Paper

Publication Date

2009

Publication Title

ICEIS 2009 - 11th International Conference on Enterprise Information Systems

First Page

Last Page

100

Abstract

Traditionally, mining web page contents involves modeling their contents to discover the underlying knowledge. Data extraction proposals represent web data in a formal structure such as database structures specific to application domains. Those models fail to catch the full diversity of web data structures which can be composed of different types of contents, and can be also unstructured. In fact, with these proposals, it is not possible to focus on a given type of contents, to work on data of different structures and to mine on data of different application domains as required to mine efficiently a given content type or web documents from different domains. On top of that, since web pages are designed to be understood by users, this paper considers modeling of web document presentations expressed through HTML tag attributes as useful for an efficient web content mining. Hence, this paper provides a general framework composed of an object-oriented web data model based on HTML tags and algorithms for web content and web presentation object extraction from any given web document. From the HTML code of a web document, web objects are extracted for mining, regardless of the domain.

Recommended Citation

Annoni, E. and Ezeife, C. I.. (2009). Modeling web documents as objects for automatic web content extraction: Object-oriented web data model. ICEIS 2009 - 11th International Conference on Enterprise Information Systems, 91-100.
https://scholar.uwindsor.ca/computersciencepub/32

Link to Full Text

Find in your library

COinS

Scholarship at UWindsor

Computer Science Publications

Modeling web documents as objects for automatic web content extraction: Object-oriented web data model

Document Type

Publication Date

Publication Title

First Page

Last Page

Abstract

Recommended Citation

Search

Browse

Author Corner

Links

Scholarship at UWindsor

Computer Science Publications

Modeling web documents as objects for automatic web content extraction: Object-oriented web data model

Authors

Document Type

Publication Date

Publication Title

First Page

Last Page

Abstract

Recommended Citation

Share

Search

Browse

Author Corner

Links