Language Models¶

The olaverse.llm module provides clean interfaces for running transformer-based language models — with correct generation defaults, stop tokens, and endpoint flexibility built in, so you don't have to figure them out yourself.

MIST — General-Purpose Model Family¶

The MIST family is olaverse's flagship LLM series, built by blending the best Llama 3.1 models via DARE+TIES and Frankenmerge techniques.

Model Cards: MIST-Mini-8B · MIST-1-70B · MIST-1-140B · MIST-1-140B-4bit · MIST-Mini-8B-Thinking

Model Variants¶

`size=`	Model	Params	Speed	Best for
`"8b"` / `"mini"`	MIST-Mini-8B	8B	~63 tok/s	Fast everyday use
`"70b"`	MIST-1-70B	70B	~23 tok/s	Structured, detailed output
`"140b"`	MIST-1-140B	140B	~8 tok/s	Deepest reasoning
`"140b-4bit"`	MIST-1-140B-4bit	140B (4-bit)	~8 tok/s	Single H100/H200 (70GB VRAM)
`"thinking"`	MIST-Mini-8B-Thinking	8B	~55 tok/s	Step-by-step reasoning with `<think>`

Why use the wrapper?¶

A bare from_pretrained call on MIST will produce rambling or cut-off output because:

Stop tokens differ per variant. MIST-8B/Thinking inherited ChatML <|im_end|> (token 128040) from its DARE+TIES parents alongside Llama 3.1's native tokens. Omitting it causes the model to not stop cleanly. MIST-70B/140B use a different set — no ChatML.
repetition_penalty and min_p are required. Without them, the model repeats and doesn't terminate. These values are verified; the defaults vary per variant.
The endpoint switch. Same .generate() / .chat() API whether you're running locally or via Featherless, Modal, or your own vLLM server.

Installation¶

Local inferenceHosted inference

pip install olaverse[deeplearning]
# requires GPU (CUDA or MPS)

pip install olaverse[hosted]
# works on any machine — no GPU needed

Usage — Local¶

from olaverse import MIST

model = MIST(size="8b")
model.load()  # downloads from Hugging Face, cached after first run

print(model.generate("Explain what makes Yoruba a tonal language."))

4-bit quantization — runs MIST-8B on a 6 GB GPU:

model = MIST(size="8b", quantize=True)
model.load()
print(model.generate("Write a Python retry decorator with exponential backoff."))

Usage — Hosted (Featherless)¶

No GPU required. Create a free API key at featherless.ai.

import os
from olaverse import MIST

model = MIST(
    size="70b",
    endpoint="featherless",
    api_key=os.environ["FEATHERLESS_API_KEY"],
)
print(model.generate("Summarise the key differences between 70B and 140B MIST models."))

from olaverse import MIST

model = MIST(
    size="140b",
    endpoint="https://your-modal-endpoint.modal.run",
)
print(model.generate("Solve step by step: If 3x + 7 = 22, find x."))

Multi-turn Chat¶

messages = [
    {"role": "user",      "content": "What is the capital of Nigeria?"},
    {"role": "assistant", "content": "The capital of Nigeria is Abuja."},
    {"role": "user",      "content": "What languages are spoken there?"},
]
print(model.chat(messages))

Streaming (hosted only)¶

model = MIST(size="8b", endpoint="featherless", api_key="...")
for chunk in model.generate("Tell me about Lagos.", stream=True):
    print(chunk, end="", flush=True)

Reasoning Variant¶

MIST-Mini-8B-Thinking was trained with 4 phases of GRPO reinforcement learning to show its reasoning before answering. The system prompt is set automatically.

model = MIST(size="thinking")
model.load()

# Default system prompt already instructs the model to use <think> tags
response = model.generate("If a train travels 120 miles in 2 hours, what is its speed?")
# Response shows <think>...</think> then the final answer

Hardware Requirements¶

Variant	Precision	VRAM
8B / Thinking	bfloat16	16 GB (RTX 3090/4090)
8B / Thinking	4-bit NF4	6 GB (RTX 3060+)
70B	bfloat16	140 GB (1× H200 or 2× H100)
70B	4-bit NF4	40 GB (1× A100/H100)
140B	bfloat16	280 GB (2× H200)
140B	4-bit NF4	70 GB (1× H200)

olaverse.llm.MIST ¶

MIST(size: str = '8b', endpoint: str = 'local', api_key: str = None, quantize: bool = False, system_prompt: str = None, max_retries: int = 3, retry_delay: float = 5.0)

Unified interface for the MIST model family by olaverse.

Handles correct stop tokens, verified sampling defaults, and a local/hosted endpoint switch — all things a bare from_pretrained call gets wrong.

Models (size=): "8b" / "mini" — MIST-Mini-8B (8B, ~63 tok/s, fast everyday use) "70b" — MIST-1-70B (70B, ~23 tok/s, structured, detailed) "140b" — MIST-1-140B (140B, ~8 tok/s, deepest reasoning) "140b-4bit" — MIST-1-140B-4bit (140B quantized, single H100/H200) "thinking" — MIST-Mini-8B-Thinking (8B reasoning, shows steps)

Endpoints (endpoint=): "local" — transformers local inference (pip install olaverse[deeplearning]) "featherless" — Featherless.ai hosted API (pip install olaverse[hosted]) Any URL — OpenAI-compatible endpoint (Modal/vLLM, etc.)

Quick start — local: >>> model = MIST(size="8b") >>> model.load() >>> print(model.generate("Explain DARE+TIES merging in one paragraph."))

Quick start — hosted: >>> model = MIST(size="70b", endpoint="featherless", api_key="your-key") >>> print(model.generate("Write a Python retry decorator."))

Multi-turn chat

messages = [ ... {"role": "user", "content": "What is MIST?"}, ... {"role": "assistant", "content": "MIST is a merged model family..."}, ... {"role": "user", "content": "How large is the 140B version?"}, ... ] print(model.chat(messages))

Parameters:

Name	Type	Description	Default
`size`	`str`	Model variant. One of "8b", "mini", "70b", "140b", "140b-4bit", "thinking". Also accepts a full Hugging Face model ID.	`'8b'`
`endpoint`	`str`	"local", "featherless", or a custom base URL (e.g. your Modal deployment).	`'local'`
`api_key`	`str`	API key for hosted endpoints. Falls back to FEATHERLESS_API_KEY env var.	`None`
`quantize`	`bool`	If True and endpoint="local", loads in 4-bit NF4 (requires bitsandbytes).	`False`
`system_prompt`	`str`	Override the default system prompt for all calls.	`None`
`max_retries`	`int`	Number of retry attempts on capacity/server errors (hosted only). Set to 1 to disable retries. Defaults to 3.	`3`
`retry_delay`	`float`	Base delay in seconds between retries. Each attempt waits `retry_delay * attempt` seconds. Defaults to 5.0.	`5.0`

Functions¶

load ¶

load()

Load the model. Required before generate()/chat() when endpoint='local'. For hosted endpoints this initialises the API client instead. Safe to call multiple times — no-op after the first load.

generate ¶

generate(prompt: str, system: str = None, max_new_tokens: int = 1024, stream: bool = False, **kwargs: float) -> str

Single-turn generation from a plain string prompt.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	User message.	required
`system`	`str`	Per-call system prompt override.	`None`
`max_new_tokens`	`int`	Maximum tokens to generate.	`1024`
`stream`	`bool`	Return a generator of partial strings instead of a full string. Only supported for hosted endpoints.	`False`
`**kwargs`	`float`	Override any default generation param (temperature, top_p, min_p, repetition_penalty).	`{}`

Returns:

Type	Description
`str`	str, or generator[str] when stream=True.

chat ¶

chat(messages: list, max_new_tokens: int = 1024, stream: bool = False, **kwargs: float) -> str

Multi-turn generation from a messages list.

Parameters:

Name	Type	Description	Default
`messages`	`list`	List of {"role": ..., "content": ...} dicts. A system message is prepended automatically if not present.	required
`max_new_tokens`	`int`	Maximum tokens to generate.	`1024`
`stream`	`bool`	Return a generator of partial strings (hosted endpoints only).	`False`
`**kwargs`	`float`	Override generation parameters.	`{}`

Returns:

Type	Description
`str`	str, or generator[str] when stream=True.

LegalPeace — Legal Contract Reasoning¶

Beta Model

LegalPeace is a research/beta model. Always verify outputs with a qualified legal professional. Trained primarily on U.S. legal data.

LegalPeace is a fine-tuned Mistral-7B-v0.3 for contract analysis and legal reasoning, loaded via unsloth for fast 4-bit quantized inference.

Model Card: olaverse/legal-peace-v1.0

Property	Value
Base Model	Mistral-7B-v0.3
Parameters	7B
Quantization	4-bit (via unsloth)
Training	SFT (4,800 cases) + DPO (419 examples)
License	Apache 2.0

Performance vs Base Mistral-7B¶

Benchmark	Improvement
Inference Speed	⚡ 10.3% faster
Contract Analysis	📋 32.6% faster
Case Predictions	⚖️ 14.0% faster

Installation¶

pip install olaverse[legal]
# or: pip install unsloth

Usage¶

from olaverse import LegalPeace

model = LegalPeace()
model.load()  # requires GPU + unsloth

clause = """
Analyze this clause: 'All disputes shall be resolved through binding
arbitration in Delaware.' What are the key implications?
"""
print(model.generate(clause, max_new_tokens=300))

Supported Use Cases¶

Contract clause analysis and risk flagging
Legal research assistance
Evidence evaluation
Case outcome prediction
Legal Q&A

olaverse.llm.LegalPeace ¶

LegalPeace(model_name='olaverse/legal-peace-v1.0', max_seq_length=2048, load_in_4bit=True)

Interface for the LegalPeace model family (Beta). Base Model: Mistral-7B-v0.3 (via unsloth 4-bit quantization). Fine-tuned for Contract Analysis & Legal Reasoning.

Warning

This is a beta model. Outputs should always be reviewed by a qualified legal professional. Not recommended for production use.

Functions¶

load ¶

load()

Load the model and tokenizer using unsloth.

generate ¶

generate(prompt: str, max_new_tokens: int = 300, temperature: float = 0.7, **kwargs) -> str

Generate legal analysis or reasoning for a prompt.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	Prompt or clause to analyze.	required
`max_new_tokens`	`int`	Maximum new tokens to generate.	`300`
`temperature`	`float`	Temperature for generation.	`0.7`

Returns:

Name	Type	Description
`str`	`str`	Decoded generation response.

LIDNeural5 — Neural Language Identification¶

Better imported from olaverse.nlp

LIDNeural5 is a sequence classifier, not an LLM — its natural home is olaverse.nlp. Both from olaverse.nlp import LIDNeural5 and from olaverse.llm import LIDNeural5 work.

See the NLP & Tokenization page for full documentation and examples.

olaverse.llm.LIDNeural5 ¶

LIDNeural5(model_name='olaverse/lid-neural-5')

Bases: _HFSequenceClassifierLID

High-accuracy transformer-based language identifier for 5 Nigerian languages.

Base Model: castorini/afriberta_large (XLM-RoBERTa, 125M parameters) Fine-tuned on: Yoruba ('yor'), Hausa ('hau'), Igbo ('ibo'), Pidgin ('pcm'), English ('eng') Validation accuracy: 98.96% macro-F1

Requires: pip install olaverse[deeplearning]

On the Hub, not yet wrapped by the SDK¶

Two small olaverse models don't have a dedicated SDK class yet — use them directly via transformers in the meantime:

mist-tg-0.3b — generates short chat titles from a user's first message. ByT5-based (~300M), English-trained, works reasonably on other Latin-script languages.
mist-qg-1.5b — multilingual question generation from a passage, across 25 languages including several African languages. Qwen2.5-1.5B-based, structured JSON output.

Both follow the standard AutoTokenizer / AutoModelForCausalLM (or T5ForConditionalGeneration for mist-tg-0.3b) loading pattern — see each model card for exact usage.

Language Models¶

MIST — General-Purpose Model Family¶

Model Variants¶

Why use the wrapper?¶

Installation¶

Usage — Local¶

Usage — Hosted (Featherless)¶

Usage — Hosted (Modal / custom vLLM)¶

Multi-turn Chat¶

Streaming (hosted only)¶

Reasoning Variant¶

Hardware Requirements¶

olaverse.llm.MIST ¶

Functions¶

load ¶

generate ¶

chat ¶

LegalPeace — Legal Contract Reasoning¶

Performance vs Base Mistral-7B¶

Installation¶

Usage¶

Supported Use Cases¶

olaverse.llm.LegalPeace ¶

Functions¶

load ¶

generate ¶

LIDNeural5 — Neural Language Identification¶

olaverse.llm.LIDNeural5 ¶

On the Hub, not yet wrapped by the SDK¶