Ottema

Open, specialized AI models for Brazilian Portuguese and reliable AI systems.

We build open-source models that solve concrete production problems in Brazilian Portuguese. Our work focuses on open-vocabulary information extraction, structured-output recovery for agents, speech recognition, and operational AI for real-world workflows.

Based in Brazil. Open research, reproducible benchmarks, production-oriented models.

Brazilian Portuguese Extraction

Open-vocabulary NER and evidence extraction for real PT-BR text — including noisy operational domains where standard models fail.

Model What it's for Result
ottema/gliner2-ptbr-harem (v0.12b) NER on journalistic/formal PT-BR. Best entity F1 among compared models on HAREM. entity F1 = 0.4749 (macro) / 0.4501 (micro), 4x faster than BERT-CRF
ottema/gliner2-ptbr (v0.4) Generalist NER for informal PT-BR (chat, atendimento, suporte). entity F1 = 0.9976 on synthetic benchmark
ottema/gliner2-ptbr-ontoevidence (v0.18) Ontology-guided evidence extraction with hard-negative rejection. First model to break the GLiNER2 "yes-man" failure mode. F1 = 0.32 on OE test, avg 4.4 pred/text
ottema/gliner2-ptbr-ontoevidence-data 2268 samples, 3 splits, multi-label spans + hard negatives. Apache-2.0

💡 Also published as ottema/gliner2-ptbr-v23 (same weights, 90+ downloads).

Browse the full Ottema Open Models collection: huggingface.co/collections/ottema/ottema/ottema-open-models-6a3600e6cc0bdc9c01dd68c8

Reliable Agents

Small specialized models that recover structured output when the LLM fails — JSON, tool calls, schema-constrained generation.

Model What it does
ottema/structfix-codet5p-220m Repairs broken JSON/tool-call output from upstream LLMs against a target schema. 220M params, fast, deterministic.
ottema/structfix-bench Benchmark: 250k examples of schema-guided generation with controlled noise and constraint coverage.
ottema/constraint-dsl Compact DSL for declaring typed constraints over JSON outputs.

Speech & Edge AI

Lightweight ASR and small models for resource-constrained deployment.

Model What it's for
ottema/stt_pt_quartznet15x5_ctc_small Research baseline / lightweight CPU ASR reference for PT-BR. See Nemotron full-stack for current SOTA.

How we work

Try it

Credits

Our models build on:

License

All models and datasets are released under Apache-2.0 unless otherwise noted.

Contact

Organization: huggingface.co/ottema