Towards Cross-Lingual Voice Adaptation for Conversational Speech
MS Thesis
Read Speech TTS is trained using read speech data. Conversational TTS is trained using classroom lecture data.
Voice Adaptation Experiments
Cross-Lingual
Generated audios are the hindi translations of the target audios
We are adapting from Bilingual (Hindi+English) speaker to the target speaker
Speaker 2
Speaker 3
Videos better when opened in firefox browser
MS Thesis
Read Speech TTS versus Conversational Speech TTS
Examples | Systems | |
---|---|---|
Read Speech TTS | Conversational TTS | |
Example 1 |
![]() |
![]() |
Example 2 |
![]() |
![]() |
Example 3 |
![]() |
![]() |
Example 4 |
![]() |
![]() |
Example 5 |
![]() |
![]() |
Read Speech TTS is trained using read speech data. Conversational TTS is trained using classroom lecture data.
Voice Adaptation Experiments
Preliminary Experiment 1 : Source Content Disentanglement
Examples | Systems | ||
---|---|---|---|
Source | Target | Generated | |
Example 1 |
![]() |
![]() |
![]() |
Preliminary Experiment 2 - Adaptation : Read Speech VC versus Conversational Speech VC
Examples | Systems | |
---|---|---|
Original | Adapatation data 7mins | |
Read Speech VC |
![]() |
![]() |
Conversational Speech VC |
![]() |
![]() |
Experiment 1 : Manually Curated Data
MonolingualExamples | Systems | ||
---|---|---|---|
Original | E2E | HTS | |
Example 1 |
![]() |
![]() |
![]() |
Example 2 |
![]() |
![]() |
![]() |
Example 3 |
![]() |
![]() |
![]() |
Example 4 |
![]() |
![]() |
![]() |
Example 5 |
![]() |
![]() |
![]() |
Cross-Lingual
Examples | Systems | |||
---|---|---|---|---|
Original | E2E | HTS | HTS + Cycle_Gan | |
Example 1 |
![]() |
![]() |
![]() |
![]() |
Example 2 |
![]() |
![]() |
![]() |
![]() |
Example 3 |
![]() |
![]() |
![]() |
![]() |
Example 4 |
![]() |
![]() |
![]() |
![]() |
Example 5 |
![]() |
![]() |
![]() |
![]() |
We are adapting from Bilingual (Hindi+English) speaker to the target speaker
Experiment 2 - Pruning
Speaker 1Examples | System Comparison | |||
---|---|---|---|---|
Original | English | Hindi | Kannada | |
Example 1 |
![]() |
![]() |
![]() |
![]() |
Example 2 |
![]() |
![]() |
![]() |
![]() |
Example 3 |
![]() |
![]() |
![]() |
![]() |
Example 4 |
![]() |
![]() |
![]() |
![]() |
Example 5 |
![]() |
![]() |
![]() |
![]() |
Speaker 2
Examples | Systems | |||
---|---|---|---|---|
Original | English | Hindi | Kannada | |
Example 1 |
![]() |
![]() |
![]() |
![]() |
Example 2 |
![]() |
![]() |
![]() |
![]() |
Example 3 |
![]() |
![]() |
![]() |
![]() |
Example 4 |
![]() |
![]() |
![]() |
![]() |
Example 5 |
![]() |
![]() |
![]() |
![]() |
Speaker 3
Examples | Systems | |
---|---|---|
Original | English | |
Example 1 |
![]() |
![]() |
Example 2 |
![]() |
![]() |
Example 3 |
![]() |
![]() |
Example 4 |
![]() |
![]() |
Example 5 |
![]() |
![]() |
Videos better when opened in firefox browser