Skip to content

Large Language Models

The olaverse.llm module provides clean interfaces for running computationally intensive transformer-based models, including large language models and deep neural language detectors.


!!! warning "Beta Model" LegalPeace is a research/beta model. It is not recommended for production use. Always verify all outputs with a qualified legal professional. Trained primarily on U.S. legal data.

LegalPeace is a fine-tuned Mistral-7B-v0.3 model optimized for contract analysis and legal reasoning. It is loaded via unsloth for fast, memory-efficient 4-bit quantized inference on GPU.

Model Card: olaverse/legal-peace-v1.0

Model Details

Property Value
Base Model Mistral-7B-v0.3
Parameters 7B
Quantization 4-bit (via unsloth)
Training SFT (4,800 cases) + DPO (419 examples)
License Apache 2.0

Performance vs Base Mistral-7B

Benchmark Improvement
Inference Speed โšก 10.3% faster
Contract Analysis ๐Ÿ“‹ 32.6% faster
Case Predictions โš–๏ธ 14.0% faster

Installation

pip install unsloth
# or, for the olaverse[legal] extras:
pip install olaverse[legal]

Usage via Olaverse

from olaverse.llm import LegalPeace

model = LegalPeace()  # defaults to "olaverse/legal-peace-v1.0"
model.load()          # downloads & loads with unsloth (requires GPU + unsloth)

prompt = "Analyze this clause: 'All disputes shall be resolved through binding arbitration in Delaware.' What are the key implications?"
response = model.generate(prompt, max_new_tokens=300, temperature=0.7)
print(response)

Usage via Hugging Face (direct)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="olaverse/legal-peace-v1.0",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

prompt = "Analyze this clause: 'All disputes shall be resolved through binding arbitration in Delaware.'"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=300, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Supported Use Cases

  • โœ… Contract clause analysis and review
  • โœ… Legal research assistance
  • โœ… Evidence evaluation
  • โœ… Case outcome prediction
  • โœ… Legal Q&A

olaverse.llm.LegalPeace

Interface for the LegalPeace model family (Beta). Base Model: Mistral-7B-v0.3 (via unsloth 4-bit quantization). Fine-tuned for Contract Analysis & Legal Reasoning.

Warning

This is a beta model. Outputs should always be reviewed by a qualified legal professional. Not recommended for production use.

Functions

load()

Load the model and tokenizer using unsloth.

generate(prompt, max_new_tokens=300, temperature=0.7, **kwargs)

Generate legal analysis or reasoning for a prompt.

Parameters:

Name Type Description Default
prompt str

Prompt or clause to analyze.

required
max_new_tokens int

Maximum new tokens to generate.

300
temperature float

Temperature for generation.

0.7

Returns:

Name Type Description
str

Decoded generation response.


LIDNeural5 โ€” Neural Language Identification

LIDNeural5 is a high-accuracy transformer sequence classifier for identifying 5 major Nigerian languages. It is fine-tuned on top of castorini/afriberta_large โ€” a multilingual XLM-RoBERTa transformer pre-trained specifically on African languages.

Model Card: olaverse/lid-neural-5

Model Details

Property Value
Base Model castorini/afriberta_large (XLM-RoBERTa)
Parameters 125M
Architecture Transformer Sequence Classification
Model Size 484 MB
License Apache 2.0

Accuracy

Language Precision Recall F1-Score
Yoruba (yor) 99.60% 99.60% 99.60%
Hausa (hau) 99.60% 99.20% 99.40%
Igbo (ibo) 98.79% 98.20% 98.50%
Nigerian Pidgin (pcm) 99.20% 98.80% 99.00%
English (eng) 97.63% 99.00% 98.31%
Overall (Macro) 98.96%

Average latency: ~13.30 ms/sentence (CPU or GPU)

Installation

pip install olaverse[deeplearning]
# installs: torch, transformers

Usage via Olaverse

from olaverse import LIDNeural5

detector = LIDNeural5()
detector.load()  # downloads olaverse/lid-neural-5 from Hugging Face

# Predict dominant language
lang = detector.predict("Kedu ka แป‹ mere today?")
print(lang)  # โ†’ 'ibo'

# Get probability distribution over all 5 classes
probs = detector.predict_proba("How far, wetin dey happen?")
print(probs)
# โ†’ {'eng': 0.002, 'hau': 0.001, 'ibo': 0.003, 'pcm': 0.991, 'yor': 0.003}

Usage via Hugging Face (direct)

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("olaverse/lid-neural-5")
model = AutoModelForSequenceClassification.from_pretrained("olaverse/lid-neural-5")

lid = pipeline("text-classification", model=model, tokenizer=tokenizer)
result = lid("Bawo ni, se daadaa ni?")
print(result)  # โ†’ [{'label': 'yor', 'score': 0.9987}]

olaverse.llm.LIDNeural5

Interface for the LIDNeural5 transformer-based language detection model. Base Model: castorini/afriberta_large (XLM-RoBERTa, 125M parameters) Fine-tuned on 5 languages: Yoruba ('yor'), Hausa ('hau'), Igbo ('ibo'), Pidgin ('pcm'), and English ('eng'). Testing Accuracy: 98.96% (Macro validation).

Functions

load()

Load the model and tokenizer using transformers.

predict_proba(text)

Predict the language probabilities for the text.

predict(text)

Predict the language of the text. Returns: 'yor', 'hau', 'ibo', 'pcm', or 'eng'.