EconPapers    
Economics at your fingertips  
 

The ARCOMEM Architecture for Social- and Semantic-Driven Web Archiving

Thomas Risse, Elena Demidova, Stefan Dietze, Wim Peters, Nikolaos Papailiou, Katerina Doka, Yannis Stavrakas, Vassilis Plachouras, Pierre Senellart, Florent Carpentier, Amin Mantrach, Bogdan Cautis, Patrick Siehndel and Dimitris Spiliotopoulos
Additional contact information
Thomas Risse: L3S Research Center, Leibniz Universität Hannover, Hannover 30167, Germany
Elena Demidova: L3S Research Center, Leibniz Universität Hannover, Hannover 30167, Germany
Stefan Dietze: L3S Research Center, Leibniz Universität Hannover, Hannover 30167, Germany
Wim Peters: NLP Group, Department of Computer Science, University of Sheffield, S1 4DP Sheffield, UK
Nikolaos Papailiou: ATHENA - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Maroussi, Athens, Greece
Katerina Doka: ATHENA - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Maroussi, Athens, Greece
Yannis Stavrakas: ATHENA - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Maroussi, Athens, Greece
Vassilis Plachouras: ATHENA - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Maroussi, Athens, Greece
Pierre Senellart: CNRS LTCIT, Institut Mines-Télécom, Télécom ParisTech, 75634 Paris Cedex 13, France
Florent Carpentier: Internet Memory Foundation, 45 ter rue de la Révolution, 93100 Montreuil, France
Amin Mantrach: Yahoo Research, 08018 Barcelona, Spain
Bogdan Cautis: CNRS LTCIT, Institut Mines-Télécom, Télécom ParisTech, 75634 Paris Cedex 13, France
Patrick Siehndel: L3S Research Center, Leibniz Universität Hannover, Hannover 30167, Germany
Dimitris Spiliotopoulos: Athens Technology Center (ATC), 15233 Halandri Athens, Greece

Future Internet, 2014, vol. 6, issue 4, 1-29

Abstract: The constantly growing amount ofWeb content and the success of the SocialWeb lead to increasing needs for Web archiving. These needs go beyond the pure preservationo of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions and other events. Due to the size of the Web, the traditional “collect-all” strategy is in many cases not the best method to build Web archives. In this paper, we present the ARCOMEM (From Future Internet 2014, 6 689 Collect-All Archives to Community Memories) architecture and implementation that uses semantic information, such as entities, topics and events, complemented with information from the Social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts.

Keywords: web archiving; web crawler; architecture; text analysis; social Web (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/6/4/688/pdf (application/pdf)
https://www.mdpi.com/1999-5903/6/4/688/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:6:y:2014:i:4:p:688-716:d:41976

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jftint:v:6:y:2014:i:4:p:688-716:d:41976