Document Type
Article
Source of Publication
Publications
Publication Date
11-14-2023
Abstract
This article describes a complex CRIS (current research information system) implementation project involving the migration of around 120,000 legacy publication records from three different systems. The project, undertaken by Tampere University, encountered several challenges in data diversity, data quality, and resource allocation. To handle the extensive and heterogenous dataset, innovative approaches such as machine learning techniques and various data wrangling tools were used to process data, correct errors, and merge information from different sources. Despite significant delays and unforeseen obstacles, the project was ultimately successful in achieving its goals. The project served as a valuable learning experience, highlighting the importance of data quality and standardized practices, and the need for dedicated resources in handling complex data migration projects in research organizations. This study stands out for its comprehensive documentation of the data wrangling and migration process, which has been less explored in the context of CRIS literature.
DOI Link
ISSN
Publisher
MDPI AG
Volume
11
Issue
4
First Page
49
Last Page
49
Disciplines
Computer Sciences | Library and Information Science
Keywords
current research information system (CRIS), research information, data migration, legacy data, data quality, machine learning, data wrangling, natural language processing (NLP)
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Lappalainen, Yrjö; Lassila, Matti; Heikkilä, Tanja; Nieminen, Jani; and Lehtilä, Tapani, "Migrating 120,000 Legacy Publications from Several Systems into a Current Research Information System Using Advanced Data Wrangling Techniques" (2023). All Works. 6230.
https://zuscholars.zu.ac.ae/works/6230
Indexed in Scopus
no
Open Access
yes
Open Access Type
Gold: This publication is openly available in an open access journal/series