Speech Synthesis (Roadmap / Experimental)¶

Experimental — No Trained Model Available Yet

The olaverse.speech module provides a TTS pipeline architecture, but olaverse does not yet ship a trained acoustic model or vocoder.

Production-ready: text normalisation (TTSNormalizer) and diacritisation (Diacritizer) — these are fully functional and available in olaverse.nlp.
Experimental: acoustic synthesis (text → Mel-spectrogram → audio waveform). The architecture is here; trained weights are on the roadmap.

Using speech classes will emit an ExperimentalWarning. To silence it:

import warnings
from olaverse import ExperimentalWarning
warnings.filterwarnings("ignore", category=ExperimentalWarning)

The diacritizers are the most valuable unfinished asset here — restoring tones from plain Yoruba text is the hardest front-end step of any Yoruba TTS system, and that part is done.

What works today¶

Use TTSNormalizer and Diacritizer directly for the NLP front-end of a TTS pipeline:

from olaverse.nlp import TTSNormalizer, Diacritizer

normalizer = TTSNormalizer(lang="yo")
diacritizer = Diacritizer(model="diacnet-yor-viterbi")

text = "Dr. Ade lo si oja lana"
normalized = normalizer.normalize(text)      # "Dọ́kítà Ade lo si oja lana"
diacritized = diacritizer.restore(normalized) # "Dọ́kítà Adé ló sí ọjà lànà"

TTS Pipeline Architecture¶

TTSPipeline wires together all four steps and is ready to use once you supply an acoustic model and vocoder. Steps 1–2 work now; steps 3–4 require your own models.

from olaverse import TTSPipeline

# Steps 1 & 2 (normalisation + diacritisation) work without custom models
pipeline = TTSPipeline(lang="yo")
result = pipeline.synthesize("Mr. Ade lo si oja lana")

print(result["normalized_text"])   # "Míṣìtà Ade lo si oja lana"
print(result["diacritized_text"])  # "Míṣìtà Adé ló sí ọjà lànà"
print(result["audio"])             # None — no acoustic model provided
print(result["status"])            # "Acoustic model or Vocoder not provided."

Injecting your own models¶

If you have a trained acoustic model and vocoder, implement the base classes and inject them:

from olaverse import TTSPipeline, BaseAcousticModel, BaseVocoder

class MyAcousticModel(BaseAcousticModel):
    def load_weights(self, path):
        ...
    def forward(self, text):
        ...  # returns Mel-spectrogram

class MyVocoder(BaseVocoder):
    def load_weights(self, path):
        ...
    def generate(self, mel):
        ...  # returns audio waveform

pipeline = TTSPipeline(
    lang="yo",
    acoustic_model=MyAcousticModel(),
    vocoder=MyVocoder(),
)
result = pipeline.synthesize("Ẹ káàárọ̀")
# result["audio"] now contains the waveform

Model Interfaces¶

olaverse.speech.TTSPipeline ¶

TTSPipeline(lang: str = 'yo', acoustic_model: BaseAcousticModel = None, vocoder: BaseVocoder = None, diacritizer_model: str = 'diacnet-yor-viterbi')

End-to-end Text-to-Speech pipeline architecture.

Orchestrates: Text Normalisation → Tone Restoration → Acoustic Model → Vocoder.

.. warning:: Experimental — no trained acoustic model or vocoder available yet. Steps 1 (normalisation) and 2 (diacritisation) are fully functional. Steps 3 and 4 require you to inject your own acoustic model and vocoder. End-to-end audio synthesis from olaverse is on the roadmap.

To silence this warning:
    import warnings
    warnings.filterwarnings("ignore", category=ExperimentalWarning)

Functions¶

synthesize ¶

synthesize(text: str)

Synthesise raw text into an audio waveform.

Returns a dict with keys

normalized_text: text after abbreviation/number expansion
diacritized_text: text after tone restoration
audio: waveform array/tensor, or None if no acoustic model/vocoder provided
status: "Success" or a message explaining what is missing

olaverse.speech.BaseAcousticModel ¶

Bases: ABC

Abstract base class for acoustic models (e.g. FastSpeech, Tacotron). Converts normalised/diacritised phonetic text into Mel-spectrograms.

.. warning:: Experimental — no trained model available yet. Subclassing this is fine for custom integrations, but olaverse does not yet ship a trained acoustic model. This is on the roadmap.

Functions¶

load_weights `abstractmethod` ¶

load_weights(path: str)

Load PyTorch/ONNX model weights from the specified path.

forward `abstractmethod` ¶

forward(text: str)

Convert text into acoustic features (e.g. a Mel-spectrogram tensor).

Parameters:

Name	Type	Description	Default
`text`	`str`	Phonetically normalised and diacritised text.	required

Returns: Acoustic features tensor.

olaverse.speech.BaseVocoder ¶

Bases: ABC

Abstract base class for vocoders (e.g. HiFi-GAN, WaveGlow). Converts Mel-spectrograms into raw audio waveforms.

.. warning:: Experimental — no trained model available yet. Subclassing this is fine for custom integrations, but olaverse does not yet ship a trained vocoder. This is on the roadmap.

Functions¶

load_weights `abstractmethod` ¶

load_weights(path: str)

Load PyTorch/ONNX model weights from the specified path.

generate `abstractmethod` ¶

generate(acoustic_features: object) -> object

Convert acoustic features into a raw audio waveform.

Parameters:

Name	Type	Description	Default
`acoustic_features`	`object`	Output from a BaseAcousticModel.	required

Returns: Audio waveform array/tensor.

Speech Synthesis (Roadmap / Experimental)¶

What works today¶

TTS Pipeline Architecture¶

Injecting your own models¶

Model Interfaces¶

olaverse.speech.TTSPipeline ¶

Functions¶

synthesize ¶

olaverse.speech.BaseAcousticModel ¶

Functions¶

load_weights abstractmethod ¶

forward abstractmethod ¶

olaverse.speech.BaseVocoder ¶

Functions¶

load_weights abstractmethod ¶

generate abstractmethod ¶

load_weights `abstractmethod` ¶

forward `abstractmethod` ¶

load_weights `abstractmethod` ¶

generate `abstractmethod` ¶