Source of Publication
This article describes a complex CRIS (current research information system) implementation project involving the migration of around 120,000 legacy publication records from three different systems. The project, undertaken by Tampere University, encountered several challenges in data diversity, data quality, and resource allocation. To handle the extensive and heterogenous dataset, innovative approaches such as machine learning techniques and various data wrangling tools were used to process data, correct errors, and merge information from different sources. Despite significant delays and unforeseen obstacles, the project was ultimately successful in achieving its goals. The project served as a valuable learning experience, highlighting the importance of data quality and standardized practices, and the need for dedicated resources in handling complex data migration projects in research organizations. This study stands out for its comprehensive documentation of the data wrangling and migration process, which has been less explored in the context of CRIS literature.
Computer Sciences | Library and Information Science
current research information system (CRIS), research information, data migration, legacy data, data quality, machine learning, data wrangling, natural language processing (NLP)
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Lappalainen, Yrjö; Lassila, Matti; Heikkilä, Tanja; Nieminen, Jani; and Lehtilä, Tapani, "Migrating 120,000 Legacy Publications from Several Systems into a Current Research Information System Using Advanced Data Wrangling Techniques" (2023). All Works. 6230.
Indexed in Scopus
Open Access Type
Gold: This publication is openly available in an open access journal/series