
NVIDIA has been on a remarkable journey with its content safety stack over the past two years. What started as a focused English text classifier has blossomed into a comprehensive family of specialized models, constantly expanding its reach to new modalities, languages, and inference methods.
In March 2026, we witnessed the release of Nemotron 3 Content Safety, a significant leap forward that merged multimodal and multilingual capabilities into a single 4-billion parameter model. Today, we’re thrilled to announce the next evolution: Nemotron 3.5 Content Safety. This new iteration completes the arc, unifying multimodal input, global language support, custom enterprise policy enforcement, and auditable reasoning into one powerful inference call.
What’s Brand New in Nemotron 3.5 Content Safety
Nemotron 3.5 isn’t just an update; it’s a complete reimagining of enterprise AI safety. We’ve introduced several groundbreaking features designed to make your AI deployments more secure, customizable, and transparent than ever before.
- Unified Multimodal Evaluation: While Nemotron 3 introduced image understanding, Nemotron 3.5 deepens this integration significantly. The model now processes a user prompt, an optional image, and an optional assistant response as a single, coherent context window. This holistic approach captures policy violations that only emerge from the interplay between text and image, or request and response, in a single pass.
- Global Language Coverage: Maintaining its explicit training coverage across 12 languages (English, French, Spanish, German, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Portuguese, and Italian), Nemotron 3.5 also leverages the impressive zero-shot generalization of the Gemma 3 base model, extending its understanding to approximately 140 languages. This means even deployments in regions with sparse training data can benefit from robust multilingual transfer.
- Custom Policy Enforcement: This is arguably the most significant architectural advancement in Nemotron 3.5. We understand that a healthcare platform has vastly different risk profiles than a financial services chatbot or a children’s education app. The model now accepts a custom policy specification alongside the input, reasoning over it to produce a verdict tailored to your unique requirements, rather than relying solely on a built-in taxonomy.
- Reasoning Traces (THINK Mode): Transparency is key in enterprise AI. Every safety verdict in Nemotron 3.5 can now be accompanied by an auditable reasoning trace through an optional “THINK mode.” When enabled, the model will output its step-by-step logic before delivering a final “safe” or “unsafe” label, along with any violated categories. This is invaluable for understanding and auditing decisions.
- Release of Safety Dataset: For the first time, we are also releasing our comprehensive safety dataset with Nemotron 3.5. This multimodal, multilingual dataset includes the safety reasoning traces used to train the model, generated through a sophisticated two-step process to ensure conciseness and accuracy. This addresses a major gap in the OSS safety model community, especially for multimodal artifacts.
Under the Hood: Architecture and Reasoning
At its core, Nemotron 3.5 Content Safety is built upon the powerful Google Gemma 3 4B IT (4-billion parameters), which offers a 128K context window, strong vision-language reasoning, and expansive multilingual coverage. NVIDIA fine-tunes this foundation with a LoRA adapter, instilling targeted safety classification behavior while keeping the model compact enough for real-time deployment on GPUs with 8GB+ VRAM.
The inference interface provides flexible output modes: a simple binary verdict, a binary verdict with specified categories, or the detailed THINK mode with step-by-step reasoning. This flexibility allows developers to balance latency and transparency according to their specific needs. Our safety taxonomy aligns with the Aegis 2.0 framework, incorporating 13 core categories and 10 fine-grained subcategories, ensuring consistency with industry benchmarks.
Reasoning acts as a supercharger for content safety, providing crucial context, customization, and accountability for production AI systems, particularly in highly regulated environments. It enables dynamic interpretation of custom, natural-language policies, which is essential given that no single universal safety taxonomy fits all production deployments.
Unpacking the Performance and Benchmarking
Nemotron 3.5 Content Safety has been rigorously evaluated across a wide array of multilingual, multimodal, and custom-policy safety benchmarks, including VLGuard, MM-SafetyBench, PolyGuard, and Aegis. These evaluations demonstrate its ability to apply consistent guardrails across global languages, text and image inputs, and domain-specific policies without introducing significant latency.
Nemotron 3 set a high bar with an 84% average accuracy on multimodal harmful-content tests and approximately half the latency of LlamaGuard-4-12B. Nemotron 3.5 not only maintains this compact 4B efficiency but also adds robust custom policy support and valuable reasoning traces.
Across multilingual and multimodal benchmarks, Nemotron 3.5 delivers strong harmful-content classification accuracy, averaging about 85%. This is critical because many existing safety models are often English-first, text-only, or too resource-intensive for widespread production use. Nemotron 3.5 is engineered to combine comprehensive multilingual coverage, multimodal classification, custom-policy support, and low-latency deployment into a single, efficient model.
On Multilingual Aegis, Nemotron 3.5 achieves an impressive average of 96.5% harmful-content classification accuracy across 12 languages. For RTP-LX, it averages 88.8%, resulting in a combined Aegis and RTP-LX average of 92.7%. This consistency empowers teams to apply a uniform safety posture across all their customer, employee, and partner-facing workflows, eliminating the need for English-only moderation or disparate regional safety models.
Efficiency and Accessibility
Accuracy is paramount, but for production-grade guardrails, efficiency is equally vital. Nemotron 3.5 Content Safety’s compact 4B design significantly reduces the cost and latency associated with repeated safety checks, making robust multilingual and multimodal guardrails a practical reality for real-world AI applications.
The latency profile in default mode remains unchanged from Nemotron 3. While THINK mode does add inference time proportional to the trace length, this overhead is predictable and can be managed by running THINK-mode evaluations asynchronously for auditing purposes, while the default mode handles real-time decisions. Our model also generates up to 50% fewer tokens compared to other reasoning safety models, leading to greater cost and latency efficiency.
We acknowledge the ongoing gaps in multimodal safety evaluation infrastructure, a challenge Nemotron 3.5’s development also faced. Our multimodal training data, featuring real images and culturally nuanced multilingual prompts, aims to bridge some of these gaps for model training, contributing to the broader safety research community.
Ready to enhance your enterprise AI safety? Nemotron 3.5 Content Safety, along with its comprehensive training dataset, is now available on Hugging Face under the NVIDIA Open Model License for both research and commercial use. It seamlessly supports transformers, vLLM, and SGLang, making integration straightforward.
Source: Hugging Face Blog