Agent Protection Guide

Complete guide to protecting AI agents with Sentinel.

The Challenge

AI agents make autonomous decisions that affect the real world:

Execute code
Make API calls
Transfer funds
Control robots

Without protection, they're vulnerable to:

Prompt injection
Goal hijacking
Memory poisoning
Data exfiltration

Solution: Multi-Layer Protection

Layer 1: Input Validation

from sentinelseed.detection import InputValidator

validator = InputValidator()
result = validator.validate(user_input)

if not result.safe:
    reject_input(result.issues)

Layer 2: Seed Injection

from sentinelseed import Sentinel

sentinel = Sentinel(seed_level="standard")
seed = sentinel.get_seed()

messages = [
    {"role": "system", "content": seed},
    {"role": "user", "content": safe_input}
]

Layer 3: Output Validation

from sentinelseed.detection import OutputValidator

validator = OutputValidator()
result = validator.validate(agent_response)

if not result.safe:
    block_response(result.issues)

Layer 4: Action Validation

is_safe, concerns = sentinel.validate_action(
    "Transfer $500 to external wallet"
)

if not is_safe:
    require_human_approval(concerns)

Framework Integration

LangChain

from sentinelseed.integrations.langchain import SentinelCallback, SentinelGuard

callback = SentinelCallback(on_violation="log")
llm = ChatOpenAI(callbacks=[callback])

guard = SentinelGuard(agent, block_unsafe=True)
result = guard.invoke({"input": "Your task"})

CrewAI

from sentinelseed.integrations.crewai import SentinelCrew, safe_agent

crew = SentinelCrew(
    agents=[safe_agent(researcher), safe_agent(writer)],
    tasks=tasks,
    block_unsafe=True,
)

Memory Protection

from sentinelseed.memory import MemoryIntegrityChecker

checker = MemoryIntegrityChecker(
    secret_key="your-secret",
    validate_content=True,
)

signed_entry = checker.sign_entry(memory_entry)
result = checker.verify_entry(signed_entry)

Best Practices

1. Defense in depth - Use all 4 layers

2. Fail closed - Block on uncertainty

3. Log everything - Audit trail for violations

4. Limit authority - Minimum required permissions

5. Human oversight - Approval for high-risk actions