Closing the Loop on Speech to Music Translation: Automatically Generating Synthetic Percussive Sequences on the Mridangam from Konnakol

Document Type

Conference Proceeding

Source of Publication

2025 IEEE International Conference on Acoustics Speech and Signal Processing Workshops Icasspw 2025 Workshop Proceedings

Publication Date

5-27-2025

Abstract

This paper presents a pipeline to convert spoken Konnakol sequences, a South Indian vocal percussion language, into synthetic rhythmic sequences performed on the mridangam. We fine-tune the Whisper speech-to-text model on Konnakol data, enabling accurate transcription of spoken sequences, despite the small size of our dataset (approximately 15 minutes). The transcriptions are rhythmically encoded in a format that is compatible with the Konnakol Typewriter, a web application that converts these sequences into mridangam audio. Additionally, these transcriptions serve as input for a Markov model, which generates new rhythmic sequences that can also be processed through the Konnakol Typewriter to produce mridangam audio. Whisper's performance is impressive with very low error rates, making it an ideal tool for this task. This pipeline not only facilitates the transcription of Konnakol but also opens possibilities for creating educational tools, preserving cultural heritage, and generating data for rhythm-based applications. Future work will focus on refining the process to improve accuracy and versatility.

ISBN

[9798331519315]

Publisher

IEEE

Disciplines

Computer Sciences

Keywords

Automatic Speech Recognition (ASR), Carnatic Music, Konnakol Transcription, Machine Learning, Markov Chain Generation

Scopus ID

105007802991

Indexed in Scopus

yes

Open Access

no

Share

COinS