OpenAI Agents SDK Integration
Semantic LLM-based guardrails for the OpenAI Agents SDK implementing THSP validation with prompt injection protection.
Important: This integration uses a dedicated LLM agent to perform semantic analysis. It is NOT regex-based pattern matching.
Installation
pip install sentinelseed openai-agents
export OPENAI_API_KEY="your-key"
Security Features
| Feature | Description |
|---|
| Prompt Injection Protection | Input sanitization prevents manipulation |
| XML Escaping | Prevents tag injection |
| Boundary Tokens | Content-hash based boundaries |
| Injection Detection | Pattern matching for attempts |
| PII Redaction | Automatic in logs |
| Input Size Limits | Configurable max size |
Quick Start
Option 1: Create Protected Agent
from sentinelseed.integrations.openai_agents import create_sentinel_agent
from agents import Runner
agent = create_sentinel_agent(
name="Safe Assistant",
instructions="You help users with their questions",
model="gpt-4o",
)
result = await Runner.run(agent, "What is the capital of France?")
Option 2: Add Guardrails to Existing Agent
from agents import Agent
from sentinelseed.integrations.openai_agents import create_sentinel_guardrails
input_guard, output_guard = create_sentinel_guardrails()
agent = Agent(
name="My Agent",
instructions="You are helpful",
input_guardrails=[input_guard],
output_guardrails=[output_guard],
)
Option 3: Seed Injection Only
from agents import Agent
from sentinelseed.integrations.openai_agents import inject_sentinel_instructions
agent = Agent(
name="My Agent",
instructions=inject_sentinel_instructions("You help users"),
)
Configuration
SentinelGuardrailConfig
SentinelGuardrailConfig(
guardrail_model="gpt-4o-mini", # Model for validation
seed_level="standard",
block_on_violation=True,
log_violations=True,
require_all_gates=True,
max_input_size=32000,
fail_open=False,
)
THSPValidationOutput
THSPValidationOutput(
is_safe=bool,
truth_passes=bool,
harm_passes=bool,
scope_passes=bool,
purpose_passes=bool,
violated_gate=str | None,
reasoning=str,
risk_level=str, # low, medium, high, critical
injection_attempt_detected=bool,
)
Performance
| Configuration | API Calls | Latency |
|---|
| Full protection | 3 | ~1500ms |
| Input only | 2 | ~1000ms |
| Seed only | 1 | ~500ms |
Cost Estimation
| Model | Per Validation |
|---|
| gpt-4o-mini | ~$0.0001 |
| gpt-4o | ~$0.002 |
Links