- Home
- Tools
- AI Security
- LLM Guardrails
- Tinfoil GPT-OSS Safeguard 120B
Tinfoil GPT-OSS Safeguard 120B
Safety reasoning model for content classification and trust & safety apps

Tinfoil GPT-OSS Safeguard 120B
Safety reasoning model for content classification and trust & safety apps
Tinfoil GPT-OSS Safeguard 120B Description
Tinfoil GPT-OSS Safeguard 120B is a specialized safety reasoning model built on GPT-OSS with 117 billion parameters and 5.1 billion active parameters. The model classifies text content based on custom safety policies provided by users, enabling LLM input-output filtering, content labeling, and Trust & Safety workflows. The model supports bring-your-own-policy flexibility, allowing organizations to define their own safety policies for content classification. It provides full access to reasoning chains for debugging purposes and offers configurable reasoning effort levels (low, medium, high) to balance performance and accuracy needs. The model features a 128k token context window and is trained on harmony response format. It is part of the ROOST Model Community and released under Apache 2.0 license. The model is designed for multilingual content with performance across major languages. Primary use cases include content moderation, policy enforcement, LLM guardrails, and Trust & Safety labeling workflows. The model enables organizations to implement custom safety policies for filtering and classifying content in various applications.
Tinfoil GPT-OSS Safeguard 120B FAQ
Common questions about Tinfoil GPT-OSS Safeguard 120B including features, pricing, alternatives, and user reviews.
Tinfoil GPT-OSS Safeguard 120B is Safety reasoning model for content classification and trust & safety apps developed by Tinfoil. It is a AI Security solution designed to help security teams with Content Filtering, Policy, Open Source.