This guide explains how to integrate TrojAI DEFEND with TrueFoundry to add real-time AI firewall guardrails to your LLM applications.Documentation Index
Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
Use this file to discover all available pages before exploring further.
What is TrojAI DEFEND?
TrojAI DEFEND is an AI firewall that validates and enforces security policies on LLM inputs and outputs in real-time. It evaluates payloads against configurable rule chains — blocking, redacting, or flagging content based on your organization’s security requirements.Key Features of TrojAI DEFEND
- Rule-Chain Firewall: TrojAI DEFEND evaluates requests through a configurable chain of rules including PII detection, prompt injection prevention, blocklist matching, pattern matching, and content moderation. Each rule can independently block, redact, flag, or pass content, and the chain determines the final action.
- Flexible Operation Modes: Support for both validation and mutation operations. Validate mode can overlap with the model on LLM input hooks where the gateway supports it; LLM output and MCP validation remain synchronous in the request path. Mutate guardrails run sequentially and can redact sensitive content (such as PII or credit card numbers) before content is released downstream. See Guardrails Overview — Operation Mode.
- Streaming and Multimodal Support: Native support for streaming responses via Server-Sent Events with a sliding window approach for real-time evaluation. The firewall also processes multimodal content — extracting text features from base64-encoded payloads for rule evaluation.
Adding TrojAI DEFEND Integration
To add TrojAI DEFEND to your TrueFoundry setup, follow these steps: Fill in the Guardrails Group Form- Name: Enter a name for your guardrails group.
- Collaborators: Add collaborators who will have access to this group.
- TrojAI Config:
- Name: Enter a name for the TrojAI DEFEND configuration.
- Description (Optional): A description for the guardrail (e.g., “TrojAI DEFEND firewall for real-time AI security”).
- Operation: The operation type for this guardrail.
- Validate: Guardrails that inspect and can block without mutating content. On LLM input validation, the gateway may run these alongside the in-flight model request when applicable; on LLM output and MCP hooks, validation runs synchronously before the response or tool result is released. See Guardrails Overview — Operation Mode.
- Mutate: Guardrails with this operation can both validate and mutate requests (e.g., redact PII). Mutate guardrails are run sequentially.
- Priority (Optional): Execution priority for mutate guardrails (lower number = runs first).
- Enforcing Strategy: Strategy for enforcing this guardrail:
- Enforce: Guardrail is applied. If a violation is detected or an error occurs, the request is blocked.
- Enforce But Ignore On Error: Guardrail is applied, but if an error occurs during execution, the guardrail is ignored and the request proceeds.
- Audit: Request is never blocked. Violations are logged for review only.
- TrojAI Client ID Auth:
- Client Id: The
x-eag-clientidvalue used to authenticate and identify your firewall policy. This determines which rule chain is applied to requests. Obtain this from your TrojAI DEFEND configuration.
- Client Id: The
- Base URL: The URL of your TrojAI DEFEND firewall instance (e.g.,
https://trojaifirewall.your-domain.com).

How TrojAI DEFEND Evaluates Requests
TrueFoundry integrates with TrojAI DEFEND using the/v1/validateParsedText endpoint. This endpoint accepts structured LLM payloads (e.g., OpenAI chat completion format), parses them using the firewall policy’s handler configuration, and runs input or output rules without calling a downstream model — TrueFoundry handles model invocation separately.
The rule direction (input vs output) is determined automatically:
- Input guardrails: Rules evaluate the user prompt before it reaches the model.
- Output guardrails: Rules evaluate the model response before it reaches the user.
Response Structure
The TrojAI DEFEND API returns a response with the full rule evaluation results:Example Response: Content Passed
Example Response: Content Passed
All rules passed — the content is safe to proceed.
Example Response: PII Blocked
Example Response: PII Blocked
A credit card number was detected by the
pii_monitor rule and the request was blocked.| Field | Type | Description |
|---|---|---|
Action | string | Overall action: PASS, BLOCK, REDACT, or FLAG |
InputRuleResults | array | Results from each input rule evaluation (populated for input guardrails) |
OutputRuleResults | array | Results from each output rule evaluation (populated for output guardrails) |
ModelInputStrings | array | Strings extracted from the payload when running input rules |
ModelOutputStrings | array | Strings extracted from the payload when running output rules |
Validation Logic
TrueFoundry uses the TrojAI DEFEND response to determine content safety:- If the
ActionisBLOCK, the request is blocked and a 400 error is returned to the caller. - If the
ActionisREDACTand the operation is set to Mutate, the redacted content replaces the original and the request proceeds. - If the
ActionisFLAG, the request proceeds but the flag is logged for audit. - If the
ActionisPASS, the original content is passed through unchanged.