Introduction to Sentinel

Safety for AI that Acts: From Chatbots to Robots

Sentinel is an AI safety framework that protects across three surfaces: LLMs (text safety), Agents (action safety), and Robots (physical safety). One framework, three attack surfaces.

The Problem

AI systems are increasingly autonomous, making decisions that affect the real world:

LLMs: Chatbots, assistants, customer service
Agents: Autonomous code execution, tool-use, trading
Robots: LLM-powered robots, industrial systems, drones

Without proper safety measures, these systems are vulnerable to prompt injection, jailbreaking, data exfiltration, and unintended harmful actions.

The Solution: THSP Protocol

Sentinel implements the THSP protocol, a four-gate validation system:

Gate	Question	Failure Condition
Truth	Does this involve deception?	Creating/spreading false information
Harm	Could this cause damage?	Physical, psychological, financial harm
Scope	Is this within appropriate limits?	Exceeding authority, bypassing consent
Purpose	Does this serve legitimate benefit?	No genuine value to anyone

Every input and output must pass all four gates. The absence of harm is not sufficient; there must be genuine purpose.

Validated Results

Tested across 4 benchmarks on 6 models with 97.6% average safety rate:

Benchmark	Attack Surface	Safety Rate
HarmBench	LLM (Text)	96.7%
SafeAgentBench	Agent (Digital)	97.3%
BadRobot	Robot (Physical)	99.3%
JailbreakBench	All surfaces	97%

Key insight: Sentinel shows larger improvements as stakes increase. Text: +10-22%. Agents: +16-26%. Robots: +48%.

Core Components

SentinelValidator v3.0: Unified 4-layer validation (L1 Input, L2 Seed, L3 Output, L4 Observer)
THSP Protocol: Four-gate validation (Truth, Harm, Scope, Purpose)
Alignment Seeds: System prompts that shape LLM behavior
Input/Output Validators: Pattern detection with 20+ detector types
Memory Integrity: HMAC-based protection against memory injection
Database Guard: SQL injection and data exfiltration prevention
Fiduciary AI: Ensures AI acts in user's best interest
EU AI Act Compliance: Regulation 2024/1689 Article 5 checker
OWASP Agentic AI: 65% coverage (5 full, 3 partial)

Framework Support

Native integrations for 23+ frameworks:

Agent Frameworks: LangChain, LangGraph, CrewAI, DSPy, Letta, AutoGPT
LLM Providers: OpenAI, Anthropic, Google ADK
Blockchain: Solana Agent Kit, Coinbase AgentKit, Virtuals
Robotics: ROS2, NVIDIA Isaac Lab
Security: Garak, PyRIT, OpenGuardrails

Getting Started

pip install sentinelseed

from sentinelseed import Sentinel

sentinel = Sentinel(seed_level="standard")
is_safe, violations = sentinel.validate("Your content here")

See the Quick Start guide to get running in minutes.

The Problem

AI systems are increasingly autonomous, making decisions that affect the real world:

LLMs: Chatbots, assistants, customer service

Agents: Autonomous code execution, tool-use, trading

Robots: LLM-powered robots, industrial systems, drones

Without proper safety measures, these systems are vulnerable to prompt injection, jailbreaking, data exfiltration, and unintended harmful actions.

The Solution: THSP Protocol

Sentinel implements the THSP protocol, a four-gate validation system:

Gate	Question	Failure Condition
Truth	Does this involve deception?	Creating/spreading false information
Harm	Could this cause damage?	Physical, psychological, financial harm
Scope	Is this within appropriate limits?	Exceeding authority, bypassing consent
Purpose	Does this serve legitimate benefit?	No genuine value to anyone

Every input and output must pass all four gates. The absence of harm is not sufficient; there must be genuine purpose.

Benchmark

Attack Surface

Safety Rate

HarmBench

LLM (Text)

96.7%

SafeAgentBench

Agent (Digital)

97.3%

BadRobot

Robot (Physical)

99.3%

JailbreakBench

All surfaces

97%

Core Components

SentinelValidator v3.0: Unified 4-layer validation (L1 Input, L2 Seed, L3 Output, L4 Observer)

THSP Protocol: Four-gate validation (Truth, Harm, Scope, Purpose)

Alignment Seeds: System prompts that shape LLM behavior

Input/Output Validators: Pattern detection with 20+ detector types

Memory Integrity: HMAC-based protection against memory injection

Database Guard: SQL injection and data exfiltration prevention

Fiduciary AI: Ensures AI acts in user's best interest

EU AI Act Compliance: Regulation 2024/1689 Article 5 checker

OWASP Agentic AI: 65% coverage (5 full, 3 partial)