The most efficient Sinhala text-to-speech (TTS) system trained on native speaker data, using deep learning, deployable on mobile and edge devices.
To build a high-quality, efficient Sinhala text-to-speech (TTS) system trained on native speaker data, using deep learning, deployable on mobile and edge devices.
Prepare a diverse and phonetically rich Sinhala sentence corpus, normalized into Unicode and Latin transliteration. Collaborate with a native Sinhala voice artist in a professional studio to record clean, high-quality audio clips.
Why it matters:
This forms the foundational dataset that directly influences the model’s clarity, pronunciation, and expressiveness.
Segment, normalize, and align the recorded audio with transcripts. Convert to standard formats (e.g., WAV + metadata.csv) and ensure sentence-level alignment for model training.
Why it matters:
Clean, well-labeled data ensures the model learns accurate pronunciation and rhythm, especially important in phonetically rich languages like Sinhala.
Use Coqui TTS with the FastSpeech 2 model to train on the Latinized Sinhala corpus. Leverage pitch and duration prediction for improved naturalness and prepare for future integration with vocoders like HiFi-GAN.
Why it matters:
FastSpeech 2 offers a fast, robust, and mobile-deployable path—perfect for multilingual and phoneme-level experimentation.
Convert model-generated mel-spectrograms into natural audio using a lightweight vocoder like HiFi-GAN or Delightful TTS vocoder, tuned for edge deployment.
Why it matters:
A great spectrogram still needs a quality vocoder. This step is what makes the model sound natural and usable in real-world apps.
Evaluate audio quality using subjective and objective metrics (MOS, intelligibility). Collect feedback from native Sinhala speakers and refine models with further fine-tuning or data augmentation.
Why it matters:
User feedback ensures the TTS sounds right to native ears, and guides improvements in clarity, accent handling, and intonation.
Optimize the model for mobile/edge devices (e.g., ONNX or PyTorch Mobile), integrate into a consumer-facing app or API, and prepare demos for outreach, pitching, or open-source release.
Why it matters:
Deployment bridges research and impact. Making Sinhala TTS accessible supports language technology growth and inclusion.