EconPapers    
Economics at your fingertips  
 

Process Model for Content Extraction from Weblogs

Andreas Schieber and Andreas Hilbert
Additional contact information
Andreas Schieber: University of Technology Dresden, Dresden, Germany
Andreas Hilbert: University of Technology Dresden, Dresden, Germany

International Journal of Intelligent Information Technologies (IJIIT), 2014, vol. 10, issue 2, 20-36

Abstract: This paper develops and evaluates a BPMN-based process model which identifies and extracts blog content from the web and stores its textual data in a data warehouse for further analyses. Depending on the characteristics of the technologies used to create the weblogs, the process has to perform specific tasks in order to extract blog content correctly. The paper describes three phases: extraction, transformation and loading of data in a repository specifically adapted for blog content extraction. It highlights the objectives in these phases which must be achieved to ensure the correct extraction. The authors integrate the described process in a previously developed framework for blog mining. The authors' process model closes the conceptual gap in this framework as well as the gap in current research of blog mining process models. Furthermore, it can easily be adapted for other web extraction proposals.

Date: 2014
References: Add references at CitEc
Citations:

Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/ijiit.2014040102 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:igg:jiit00:v:10:y:2014:i:2:p:20-36

Access Statistics for this article

International Journal of Intelligent Information Technologies (IJIIT) is currently edited by Vijayan Sugumaran

More articles in International Journal of Intelligent Information Technologies (IJIIT) from IGI Global
Bibliographic data for series maintained by Journal Editor ().

 
Page updated 2025-03-19
Handle: RePEc:igg:jiit00:v:10:y:2014:i:2:p:20-36