Fastino Labs, Creator of GLiNER, Releases Two State-of-the-Art Language Models 1,000x Smaller Than Frontier
The two open-source language models are 1000x smaller than large models from companies like OpenAI and Anthropic, with higher accuracy. PALO ALTO, Calif., May 14, 2026 /PRNewswire/ —ย Today, Fastino Labs released two new open-source small language models, GLiGuard and GLiNER2-PII, both built primarily with an autonomous agent, Pioneer. The models contain 300 million parameters, run…
The two open-source language models are 1000x smaller than large models from companies like OpenAI and Anthropic, with higher accuracy.
PALO ALTO, Calif., May 14, 2026 /PRNewswire/ —ย Today, Fastino Labs released two new open-source small language models, GLiGuard and GLiNER2-PII, both built primarily with an autonomous agent, Pioneer. The models contain 300 million parameters, run inference in under 100 milliseconds, and outperform the accuracy of decoder models from OpenAI, NVIDIA, Meta, and Google that are up to 90 times larger. GLiGuard runs up to 20 times faster than current state-of-the-art guardrail models, while GLiNER2-PII achieves the highest span-level F1 of any publicly available PII model across seven languages and 42 entity types.
GLiNER2-PII achieves the highest accuracy of any publicly available PII model on the SPY benchmark
The releases come as enterprise AI deployments increasingly require dedicated infrastructure for safety moderation and privacy filtering. As agents gain the ability to browse the web, execute code, and act on a user’s behalf, the cost of unsafe LLM inputs and outputs or PII leakage has grown substantially.
GLiGuard is a 300 million parameter encoder model that performs four safety moderation tasks in a single forward pass: safety classification, jailbreak detection, harm category detection, and refusal detection. Across nine established safety benchmarks, GLiGuard’s accuracy matches or exceeds decoder-based models 23 to 90 times its size, including Meta’s LlamaGuard4 (12B), Google’s ShieldGemma (27B), and NVIDIA’s NemoGuard (8B), while running up to 20 times faster.
“Current state-of-the-art guardrail models are doing safety moderation with 7 to 27 billion parameter decoder models. They’re using text generation to solve what is fundamentally a classification problem, which is slow, expensive, and impractical at production scale.” said Ash Lewis, CEO and Co-Founder of Fastino Labs.
GLiNER2-PII: Best-in-Classย PII Detection
GLiNER2-PII is a 300 million parameter multilingual model for detecting and redacting personally identifiable information across 42 entity types and seven languages. On the SPY benchmark, GLiNER2-PII achieved the highest span-level F1 of any publicly available PII model, outperforming OpenAI’s recently released Privacy Filter, NVIDIA’s GLiNER PII, and two other leading detectors.
Unlike OpenAI’s Privacy Filter, which repurposes a 1.5 billion parameter decoder checkpoint and locks developers into a fixed schema of 8 entity types, GLiNER2-PII is label-conditioned, meaning the target schema is an input to the model rather than a property baked into its weights. This lets the same checkpoint serve any organization’s PII policy without retraining, broad masking for analytics pipelines or fine-grained redaction for compliance audits, all from one model.
“Developers building agents today need models that are faster and more deterministic than what frontier decoder models can offer,” said George Hurn-Maloney, COO and Co-Founder of Fastino Labs. “When a guardrail or PII model gets called on every input and every output, latency compounds quickly and probabilistic behavior becomes a real liability. GLiGuard and GLiNER2-PII give developers sub-100ms inference with deterministic outputs, exactly what production agentic systems need.”
Pioneer: The Autonomous Research Agent Behind Both Models
Pioneer, Fastino Labs’ autonomous research agent, played a central role in pushing both models past the accuracy of much larger alternatives. Pioneer synthesized targeted training data, ran parallel post-training experiments, and iterated on real and synthetic dataset composition without human intervention.
For GLiGuard, Pioneer generated supplemental synthetic data targeting fine-grained distinctions between similar harm categories like toxic speech and violence, which the model initially struggled to separate. For GLiNER2-PII, Pioneer produced 4,910 high-quality annotated examples across seven languages and document formats including chat logs, support tickets, CRM notes, KYC forms, invoices, and medical records.
“What used to take our research team months of manual experimentation now takes hours,” said Lewis. “Pioneer ran dozens of training experiments autonomously, meaning the accuracy you see in GLiGuard and GLiNER2-PII came out of an agentic research process, not a traditional one.”
Pioneer was published in a recent Fastino Labs research paper demonstrating gains of up to 83.8 percentage points on standard benchmarks across cold-start fine-tuning and production failure repair. Both GLiGuard and GLiNER2-PII are flagship examples of agentic post-training in practice: research-grade models developed in days rather than months, with model quality driven by an autonomous loop rather than manual experimentation.
Why Small Models Matter forย Agentic AI
Both models reflect Fastino Labs’ thesis that small, highly-accurate language models will power the next wave of production AI deployments. Guardrail and privacy models are called on every user input and every model output, meaning even small latency increases compound quickly as conversations grow.
“Every Fortune 500 deploying agents today is building their own internal guardrail and PII infrastructure,” added Hurn-Maloney. “We’re open-sourcing two of the best models in the world for these tasks because the entire industry benefits when this layer becomes a commodity.”
Availability
Both GLiGuard and GLiNER2-PII are available today on Hugging Face under the Apache 2.0 license and for inference on Pioneer, Fastino Labs’ agentic inference platform. The accompanying research papers are available on arXiv.
Aboutย Fastino Labs
Fastino Labs is a research lab based in Palo Alto, California, building small language models and tooling such as Pioneer; to inference and fine-tune language models. The company is the creator of Pioneer, an agent that improves the accuracy of LLMs in production over time and the GLiNER open-source model family, which has been downloaded more than 30 million times and is used in production by Fortune 500 teams including NVIDIA, Meta, and Airbnb. Fastino Labs has raised $25 million through its seed round and is backed by investors including Khosla Ventures, Insight Partners, and Microsoft M12.
About Pioneer
Pioneer is an inference API from Fastino Labs that gives developers access to 30+ leading open-source and frontier models, including Anthropic’s Opus, GPT, Gemma, Nemotron, and DeepSeek. Pioneer then continuously improves the models using real production traffic. Pioneer watches live requests, identifies where a model is failing, and retrains and promotes new checkpoints when they outperform the current one, with no ML engineer required and no fine-tuning code to write. Customers see an average 30% accuracy lift on agentic tasks such as classification and extraction versus base open-source models, with the first auto-improvement run typically landing in production within days. Learn more atย pioneer.ai.