EconPapers    
Economics at your fingertips  
 

Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

C. I. Ezeife and Titas Mutsuddy
Additional contact information
C. I. Ezeife: School of Computer Science, University of Windsor, Windsor, ON, Canada
Titas Mutsuddy: School of Computer Science, University of Windsor, Windsor, ON, Canada

International Journal of Data Warehousing and Mining (IJDWM), 2012, vol. 8, issue 4, 1-21

Abstract: The process of extracting comparative heterogeneous web content data which are derived and historical from related web pages is still at its infancy and not developed. Discovering potentially useful and previously unknown information or knowledge from web contents such as “list all articles on ’Sequential Pattern Mining’ written between 2007 and 2011 including title, authors, volume, abstract, paper, citation, year of publication,” would require finding the schema of web documents from different web pages, performing web content data integration, building their virtual or physical data warehouse before web content extraction and mining from the database. This paper proposes a technique for automatic web content data extraction, the WebOMiner system, which models web sites of a specific domain like Business to Customer (B2C) web sites, as object oriented database schemas. Then, non-deterministic finite state automata (NFA) based wrappers for recognizing content types from this domain are built and used for extraction of related contents from data blocks into an integrated database for future second level mining for deep knowledge discovery.

Date: 2012
References: Add references at CitEc
Citations:

Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 4018/jdwm.2012100101 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:igg:jdwm00:v:8:y:2012:i:4:p:1-21

Access Statistics for this article

International Journal of Data Warehousing and Mining (IJDWM) is currently edited by Eric Pardede

More articles in International Journal of Data Warehousing and Mining (IJDWM) from IGI Global
Bibliographic data for series maintained by Journal Editor ().

 
Page updated 2025-03-19
Handle: RePEc:igg:jdwm00:v:8:y:2012:i:4:p:1-21