RyanSpeech – Mohammad H. Mahoor, PhD

RyanSpeech Corpus

RyanSpeech is a new speech corpus for research on automated text-to-speech (TTS) systems. Publicly available TTS corpora are often noisy, recorded with multiple speakers, or do not have quality male speech data. In order to meet the need for a high-quality, publicly available male speech corpus within the field of speech recognition, we designed and created RyanSpeech. We have derived RyanSpeech’s textual materials from real-world conversational settings, and these materials contain over 10 hours of a professional male voice actor’s speech recorded at 44.1 kHz. Both the design and pipeline of this corpus creation make RyanSpeech ideal for developing TTS systems in real-world applications. To provide a baseline for future research, protocols, and benchmarks, we trained 4 state-of-the-art speech models and a vocoder on RyanSpeech. The results show 3.36 in mean opinion scores (MOS) in our best model. We have made the trained models publicly available for download from.

Please cite the following paper in any of your manuscripts that make any use of the database. The reference is:

@inproceedings{Zandie2021RyanSpeechAC,
title={RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis},
author={Rohola Zandie and Mohammad H. Mahoor and Julia Madsen and Eshrat S. Emamian},
booktitle={Interspeech},
year={2021}
}

For downloading RyanSpeech corpus, Please fill out the request form HERE.

Here are sample sentences from the dataset:


Sentence	Ground Truth	Conformer Model
Is there a specific Cuisine that you like?
Hi. I’m looking to order some takeout food. I’d like burgers.
Depression is not a sensation that I can really experience, but I sympathize with those who do.
He was carefully examining the foolscap, upon which the words were pasted, holding it only an inch or two from his eyes.
My birthday was yesterday and I very much appreciate it!