User-Friendly and Extensible Web Data Extraction
T. Novella () and
I. Holubová ()
Additional contact information
T. Novella: Charles University
I. Holubová: Charles University
A chapter in Advances in Information Systems Development, 2018, pp 225-241 from Springer
Abstract:
Abstract Creation of web wrappers is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces tradeoffs between expressiveness of a wrapper’s language and safety. In addition, little attention has been paid to execution of a wrapper in a restricted environment. In this paper we present a new wrapping language—Serrano—that has three goals: (1) ability to run in a restricted environment, such as a browser extension, (2) extensibility to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully deployed in a number of projects and provided competitive results.
Keywords: Web data extraction; Safe execution; Restricted environment; Web browser extension (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:lnichp:978-3-319-74817-7_14
Ordering information: This item can be ordered from
http://www.springer.com/9783319748177
DOI: 10.1007/978-3-319-74817-7_14
Access Statistics for this chapter
More chapters in Lecture Notes in Information Systems and Organization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().