Open, specialized AI models for Brazilian Portuguese and reliable AI systems.
We build open-source models that solve concrete production problems in Brazilian Portuguese. Our work focuses on open-vocabulary information extraction, structured-output recovery for agents, speech recognition, and operational AI for real-world workflows.
Based in Brazil. Open research, reproducible benchmarks, production-oriented models.
Open-vocabulary NER and evidence extraction for real PT-BR text — including noisy operational domains where standard models fail.
| Model | What it's for | Result |
|---|---|---|
ottema/gliner2-ptbr-harem (v0.12b) |
NER on journalistic/formal PT-BR. Best entity F1 among compared models on HAREM. | entity F1 = 0.4749 (macro) / 0.4501 (micro), 4x faster than BERT-CRF |
ottema/gliner2-ptbr (v0.4) |
Generalist NER for informal PT-BR (chat, atendimento, suporte). | entity F1 = 0.9976 on synthetic benchmark |
ottema/gliner2-ptbr-ontoevidence (v0.18) |
Ontology-guided evidence extraction with hard-negative rejection. First model to break the GLiNER2 "yes-man" failure mode. | F1 = 0.32 on OE test, avg 4.4 pred/text |
ottema/gliner2-ptbr-ontoevidence-data |
2268 samples, 3 splits, multi-label spans + hard negatives. | Apache-2.0 |
💡 Also published as
ottema/gliner2-ptbr-v23(same weights, 90+ downloads).
Browse the full Ottema Open Models collection: huggingface.co/collections/ottema/ottema/ottema-open-models-6a3600e6cc0bdc9c01dd68c8
Small specialized models that recover structured output when the LLM fails — JSON, tool calls, schema-constrained generation.
| Model | What it does |
|---|---|
ottema/structfix-codet5p-220m |
Repairs broken JSON/tool-call output from upstream LLMs against a target schema. 220M params, fast, deterministic. |
ottema/structfix-bench |
Benchmark: 250k examples of schema-guided generation with controlled noise and constraint coverage. |
ottema/constraint-dsl |
Compact DSL for declaring typed constraints over JSON outputs. |
Lightweight ASR and small models for resource-constrained deployment.
| Model | What it's for |
|---|---|
ottema/stt_pt_quartznet15x5_ctc_small |
Research baseline / lightweight CPU ASR reference for PT-BR. See Nemotron full-stack for current SOTA. |
ottema/gliner2-ptbr-demo — interactive Gradio demo with model selection, label presets, and 7 example sentences spanning journalistic, informal, and operational Portuguese.ottema/structfix-demo — repair broken JSON / tool-call output against typed schemas. 5 schema presets, 10 broken-output presets, 13 paired examples.Our models build on:
All models and datasets are released under Apache-2.0 unless otherwise noted.
Organization: huggingface.co/ottema