Computer Science Publications

Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

C. I. Ezeife, University of WindsorFollow
T. Mutsuddy

Document Type

Article

Publication Date

2012

Publication Title

International Journal of Data Warehousing and Mining (IJDWM)

Volume

Issue

First Page

Last Page

Abstract

The process of extracting comparative heterogeneous web content data which are derived and historical from related web pages is still at its infancy and not developed. Discovering potentially useful and previously unknown information or knowledge from web contents such as “list all articles on ’Sequential Pattern Mining’ written between 2007 and 2011 including title, authors, volume, abstract, paper, citation, year of publication,” would require finding the schema of web documents from different web pages, performing web content data integration, building their virtual or physical data warehouse before web content extraction and mining from the database. This paper proposes a technique for automatic web content data extraction, the WebOMiner system, which models web sites of a specific domain like Business to Customer (B2C) web sites, as object oriented database schemas. Then, non-deterministic finite state automata (NFA) based wrappers for recognizing content types from this domain are built and used for extraction of related contents from data blocks into an integrated database for future second level mining for deep knowledge discovery.

DOI

10.4018/jdwm.2012100101

Recommended Citation

Ezeife, C. I. and Mutsuddy, T.. (2012). Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System. International Journal of Data Warehousing and Mining (IJDWM), 8 (4), 1-21.
https://scholar.uwindsor.ca/computersciencepub/44

Link to Full Text

Find in your library

COinS

Scholarship at UWindsor

Computer Science Publications

Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

Document Type

Publication Date

Publication Title

Volume

Issue

First Page

Last Page

Abstract

DOI

Recommended Citation

Search

Browse

Author Corner

Links

Scholarship at UWindsor

Computer Science Publications

Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

Authors

Document Type

Publication Date

Publication Title

Volume

Issue

First Page

Last Page

Abstract

DOI

Recommended Citation

Share

Search

Browse

Author Corner

Links