Closing the Loop on Speech to Music Translation: Automatically Generating Synthetic Percussive Sequences on the Mridangam from Konnakol

Document Type

Conference Proceeding

Source of Publication

2025 IEEE International Conference on Acoustics Speech and Signal Processing Workshops Icasspw 2025 Workshop Proceedings

Publication Date

5-27-2025

Abstract

This paper presents a pipeline to convert spoken Konnakol sequences, a South Indian vocal percussion language, into synthetic rhythmic sequences performed on the mridangam. We fine-tune the Whisper speech-to-text model on Konnakol data, enabling accurate transcription of spoken sequences, despite the small size of our dataset (approximately 15 minutes). The transcriptions are rhythmically encoded in a format that is compatible with the Konnakol Typewriter, a web application that converts these sequences into mridangam audio. Additionally, these transcriptions serve as input for a Markov model, which generates new rhythmic sequences that can also be processed through the Konnakol Typewriter to produce mridangam audio. Whisper's performance is impressive with very low error rates, making it an ideal tool for this task. This pipeline not only facilitates the transcription of Konnakol but also opens possibilities for creating educational tools, preserving cultural heritage, and generating data for rhythm-based applications. Future work will focus on refining the process to improve accuracy and versatility.

DOI Link

10.1109/icasspw65056.2025.11011256

ISBN

[9798331519315]

Publisher

IEEE

Disciplines

Computer Sciences

Keywords

Automatic Speech Recognition (ASR), Carnatic Music, Konnakol Transcription, Machine Learning, Markov Chain Generation

Scopus ID

105007802991

Recommended Citation

Krishnan, Gopika; Drabek, Julia; Anantapadmanabhan, Akshay; Ganguli, Kaustuv Kanti; and Guedes, Carlos, "Closing the Loop on Speech to Music Translation: Automatically Generating Synthetic Percussive Sequences on the Mridangam from Konnakol" (2025). All Works. 7434.
https://zuscholars.zu.ac.ae/works/7434

Indexed in Scopus

yes

Open Access

no

All Works

Closing the Loop on Speech to Music Translation: Automatically Generating Synthetic Percussive Sequences on the Mridangam from Konnakol

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISBN

Publisher

Disciplines

Keywords

Scopus ID

Recommended Citation

Indexed in Scopus

Open Access

Search

Browse

Contribute

Content Type

All Works

Closing the Loop on Speech to Music Translation: Automatically Generating Synthetic Percussive Sequences on the Mridangam from Konnakol

Author First name, Last name, Institution

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISBN

Publisher

Disciplines

Keywords

Scopus ID

Recommended Citation

Indexed in Scopus

Open Access

Share

Search

Browse

Contribute

Content Type