Migrating 120,000 Legacy Publications from Several Systems into a Current Research Information System Using Advanced Data Wrangling Techniques
Yrjö Lappalainen (),
Matti Lassila,
Tanja Heikkilä,
Jani Nieminen and
Tapani Lehtilä
Additional contact information
Yrjö Lappalainen: Library and Learning Commons, Zayed University, Dubai P.O. Box 19282, United Arab Emirates
Matti Lassila: Tampere University Library, Tampere University, 33014 Tampere, Finland
Tanja Heikkilä: Finnish Geospatial Research Institute (FGI), National Land Survey of Finland (NLS), 02150 Espoo, Finland
Jani Nieminen: Tampere University Library, Tampere University, 33014 Tampere, Finland
Tapani Lehtilä: Tampere University Library, Tampere University, 33014 Tampere, Finland
Publications, 2023, vol. 11, issue 4, 1-16
Abstract:
This article describes a complex CRIS (current research information system) implementation project involving the migration of around 120,000 legacy publication records from three different systems. The project, undertaken by Tampere University, encountered several challenges in data diversity, data quality, and resource allocation. To handle the extensive and heterogenous dataset, innovative approaches such as machine learning techniques and various data wrangling tools were used to process data, correct errors, and merge information from different sources. Despite significant delays and unforeseen obstacles, the project was ultimately successful in achieving its goals. The project served as a valuable learning experience, highlighting the importance of data quality and standardized practices, and the need for dedicated resources in handling complex data migration projects in research organizations. This study stands out for its comprehensive documentation of the data wrangling and migration process, which has been less explored in the context of CRIS literature.
Keywords: current research information system (CRIS); research information; data migration; legacy data; data quality; machine learning; data wrangling; natural language processing (NLP) (search for similar items in EconPapers)
JEL-codes: A2 D83 L82 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2304-6775/11/4/49/pdf (application/pdf)
https://www.mdpi.com/2304-6775/11/4/49/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jpubli:v:11:y:2023:i:4:p:49-:d:1279756
Access Statistics for this article
Publications is currently edited by Ms. Jennifer Zhang
More articles in Publications from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().