Speech Synthesis
The olaverse.speech module provides a flexible, robust Text-to-Speech (TTS) architecture.
TTS Pipeline
The TTSPipeline coordinates the entire flow from raw text to output waveforms, automatically handling text normalization and diacritic restoration before passing inputs to your Acoustic models.
olaverse.speech.TTSPipeline
End-to-end Text-to-Speech pipeline.
Orchestrates the entire flow: 1. Text Normalization (Numbers, abbreviations) 2. Tone Restoration (Diacritization) 3. Acoustic Modeling (Text -> Mel-spectrogram) 4. Vocoding (Mel-spectrogram -> Audio Waveform)
Functions
synthesize(text)
Synthesize raw text into an audio waveform.
Model Interfaces
If you are training or integrating custom Acoustic models and Vocoders, ensure they inherit from these base classes.
olaverse.speech.BaseAcousticModel
Bases: ABC
Abstract base class for Acoustic Models (e.g., FastSpeech, Tacotron). These models convert normalized/diacritized phonetic text into acoustic features like Mel-spectrograms.
Functions
load_weights(path)
abstractmethod
Load PyTorch/ONNX model weights from the specified path.
forward(text)
abstractmethod
Convert text into acoustic features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Phonetically normalized text. |
required |
Returns: Acoustic features (e.g., a Mel-spectrogram tensor).
olaverse.speech.BaseVocoder
Bases: ABC
Abstract base class for Vocoders (e.g., HiFi-GAN, WaveGlow). These models convert acoustic features (like Mel-spectrograms) into raw audio waveforms.
Functions
load_weights(path)
abstractmethod
Load PyTorch/ONNX model weights from the specified path.
generate(acoustic_features)
abstractmethod
Convert acoustic features into a raw audio waveform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
acoustic_features
|
Output from an Acoustic Model. |
required |
Returns: Audio waveform array/tensor.