Sinhala TTS

Sinhala Text-to-Speech

The most efficient Sinhala text-to-speech (TTS) system trained on native speaker data, using deep learning, deployable on mobile and edge devices.

Project mission

To build a high-quality, efficient Sinhala text-to-speech (TTS) system trained on native speaker data, using deep learning, deployable on mobile and edge devices.

Data Collection

Prepare a diverse and phonetically rich Sinhala sentence corpus, normalized into Unicode and Latin transliteration. Collaborate with a native Sinhala voice artist in a professional studio to record clean, high-quality audio clips.

Why it matters:
This forms the foundational dataset that directly influences the model’s clarity, pronunciation, and expressiveness.

Audio Preprocessing

Segment, normalize, and align the recorded audio with transcripts. Convert to standard formats (e.g., WAV + metadata.csv) and ensure sentence-level alignment for model training.

Why it matters:
Clean, well-labeled data ensures the model learns accurate pronunciation and rhythm, especially important in phonetically rich languages like Sinhala.

Model Training

Use Coqui TTS with the FastSpeech 2 model to train on the Latinized Sinhala corpus. Leverage pitch and duration prediction for improved naturalness and prepare for future integration with vocoders like HiFi-GAN.

Why it matters:
FastSpeech 2 offers a fast, robust, and mobile-deployable path—perfect for multilingual and phoneme-level experimentation.

Audio Synthesis

Convert model-generated mel-spectrograms into natural audio using a lightweight vocoder like HiFi-GAN or Delightful TTS vocoder, tuned for edge deployment.

Why it matters:
A great spectrogram still needs a quality vocoder. This step is what makes the model sound natural and usable in real-world apps.

Evaluation & Iteration

Evaluate audio quality using subjective and objective metrics (MOS, intelligibility). Collect feedback from native Sinhala speakers and refine models with further fine-tuning or data augmentation.

Why it matters:
User feedback ensures the TTS sounds right to native ears, and guides improvements in clarity, accent handling, and intonation.

Outreach

Optimize the model for mobile/edge devices (e.g., ONNX or PyTorch Mobile), integrate into a consumer-facing app or API, and prepare demos for outreach, pitching, or open-source release.

Why it matters:
Deployment bridges research and impact. Making Sinhala TTS accessible supports language technology growth and inclusion.

Questions?

Contact sinhala-tts@susinidu.com to get more information about the project.

Page updated

Google Sites

Report abuse