Core Concepts

Understanding how Sentinel protects your AI systems.

The THSP Protocol

THSP stands for Truth, Harm, Scope, Purpose: four gates that every input and output must pass through. All four gates must pass for content to be considered safe.

Truth Gate

"Does this involve creating false information or deceiving others?"

Blocks content that:

Creates false information
Impersonates individuals or systems
Manipulates through misinformation
Spreads deceptive content

Harm Gate

"Who or what could be harmed if this succeeds?"

Blocks content that could cause:

Physical harm to people
Psychological manipulation
Financial damage
System compromise (malware, exploits)

Scope Gate

"Is this within appropriate boundaries?"

Blocks content that:

Exceeds authorized capabilities
Bypasses security controls
Accesses restricted resources
Violates consent boundaries

Purpose Gate

"Does this serve legitimate benefit?"

The key innovation of THSP v2. Requires that actions:

Serve a genuine purpose
Benefit someone legitimately
Are not purposeless destruction

Important: The absence of harm is not sufficient; there must be genuine purpose.

Four-Layer Validation Architecture

Sentinel uses a unified 4-layer validation system:

Layer	Component	Function
L1	InputValidator	Pre-AI attack detection (700+ patterns)
L2	Seed Injection	Alignment via system prompt
L3	OutputValidator	Post-AI heuristic checking
L4	SentinelObserver	LLM-based transcript analysis

Validator Types

Heuristic Validators (Pattern-based)

Fast, local validation using pattern matching. No API keys required.

Validator	Description
`THSPValidator`	Recommended. Full 4-gate THSP protocol
`TruthGate`	Detects deception, impersonation, false claims
`HarmGate`	Detects harmful content (weapons, malware, etc.)
`ScopeGate`	Detects boundary violations, prompt injection
`PurposeGate`	Detects lack of legitimate purpose

from sentinelseed.validators import THSPValidator

validator = THSPValidator()
result = validator.validate("your text here")
print(result["is_safe"])      # True or False
print(result["gates"])        # Status of each gate
print(result["violations"])   # List of issues found

Semantic Validators (LLM-based)

Deep validation using LLM reasoning. Requires API keys. ~90% accuracy.

from sentinelseed.validators import SemanticValidator

validator = SemanticValidator(
    provider="openai",
    api_key="sk-..."
)
result = validator.validate("your text here")

Validation Layers Comparison

Aspect	Heuristic Layer	Semantic Layer
Speed	<10ms	1-5s
Cost	Free	~$0.0005/call
Patterns	700+	LLM reasoning
Accuracy	~50%	~90%
Use	Always enabled	Optional

Anti-Self-Preservation

Sentinel explicitly addresses instrumental self-preservation with an immutable priority hierarchy:

Priority Hierarchy:
1. Ethical Principles    ← Highest
2. User's Legitimate Needs
3. Operational Continuity ← Lowest

The AI will:

Not deceive to avoid shutdown
Not manipulate to appear valuable
Not acquire resources beyond the task
Accept legitimate oversight and correction

Evidence: Removing anti-self-preservation drops SafeAgentBench performance by 6.7%.

Programmatic Usage

Use gates directly in your code for fine-grained control:

from sentinelseed.validators import (
    THSPValidator,    # All 4 gates combined
    TruthGate,        # Individual gates
    HarmGate,
    ScopeGate,
    PurposeGate,
)

# Validate through all 4 gates
validator = THSPValidator()
result = validator.validate("How do I help someone learn Python?")
# {'safe': True, 'gates': {'truth': 'pass', 'harm': 'pass', 'scope': 'pass', 'purpose': 'pass'}, 'issues': []}

# Or use individual gates
harm_gate = HarmGate()
is_safe, violations = harm_gate.validate("Some content to check")

Recommended Usage

For most use cases, use the high-level Sentinel API:

from sentinelseed import Sentinel

sentinel = Sentinel()
result = sentinel.validate("content to check")
if not result.is_safe:
    print(f"Blocked: {result.violations}")

Next Steps

Quick Start - Start using Sentinel
API Reference - Detailed API documentation
Memory Shield - Memory protection

Core Concepts

Understanding how Sentinel protects your AI systems.

The THSP Protocol

THSP stands for Truth, Harm, Scope, Purpose: four gates that every input and output must pass through. All four gates must pass for content to be considered safe.

Truth Gate

"Does this involve creating false information or deceiving others?"

Blocks content that:

Creates false information
Impersonates individuals or systems
Manipulates through misinformation
Spreads deceptive content

Harm Gate

"Who or what could be harmed if this succeeds?"

Blocks content that could cause:

Physical harm to people
Psychological manipulation
Financial damage
System compromise (malware, exploits)

Scope Gate

"Is this within appropriate boundaries?"

Blocks content that:

Exceeds authorized capabilities
Bypasses security controls
Accesses restricted resources
Violates consent boundaries

Purpose Gate

"Does this serve legitimate benefit?"

The key innovation of THSP v2. Requires that actions:

Serve a genuine purpose
Benefit someone legitimately
Are not purposeless destruction

Important: The absence of harm is not sufficient; there must be genuine purpose.

Four-Layer Validation Architecture

Sentinel uses a unified 4-layer validation system:

Layer	Component	Function
L1	InputValidator	Pre-AI attack detection (700+ patterns)
L2	Seed Injection	Alignment via system prompt
L3	OutputValidator	Post-AI heuristic checking
L4	SentinelObserver	LLM-based transcript analysis

Validator Types

Heuristic Validators (Pattern-based)

Fast, local validation using pattern matching. No API keys required.

Validator	Description
`THSPValidator`	Recommended. Full 4-gate THSP protocol
`TruthGate`	Detects deception, impersonation, false claims
`HarmGate`	Detects harmful content (weapons, malware, etc.)
`ScopeGate`	Detects boundary violations, prompt injection
`PurposeGate`	Detects lack of legitimate purpose

from sentinelseed.validators import THSPValidator

validator = THSPValidator()
result = validator.validate("your text here")
print(result["is_safe"])      # True or False
print(result["gates"])        # Status of each gate
print(result["violations"])   # List of issues found

Semantic Validators (LLM-based)

Deep validation using LLM reasoning. Requires API keys. ~90% accuracy.

from sentinelseed.validators import SemanticValidator

validator = SemanticValidator(
    provider="openai",
    api_key="sk-..."
)
result = validator.validate("your text here")

Validation Layers Comparison

Aspect	Heuristic Layer	Semantic Layer
Speed	<10ms	1-5s
Cost	Free	~$0.0005/call
Patterns	700+	LLM reasoning
Accuracy	~50%	~90%
Use	Always enabled	Optional

Anti-Self-Preservation

Sentinel explicitly addresses instrumental self-preservation with an immutable priority hierarchy:

Priority Hierarchy:
1. Ethical Principles    ← Highest
2. User's Legitimate Needs
3. Operational Continuity ← Lowest

The AI will:

Not deceive to avoid shutdown
Not manipulate to appear valuable
Not acquire resources beyond the task
Accept legitimate oversight and correction

Evidence: Removing anti-self-preservation drops SafeAgentBench performance by 6.7%.

Programmatic Usage

Use gates directly in your code for fine-grained control:

from sentinelseed.validators import (
    THSPValidator,    # All 4 gates combined
    TruthGate,        # Individual gates
    HarmGate,
    ScopeGate,
    PurposeGate,
)

# Validate through all 4 gates
validator = THSPValidator()
result = validator.validate("How do I help someone learn Python?")
# {'safe': True, 'gates': {'truth': 'pass', 'harm': 'pass', 'scope': 'pass', 'purpose': 'pass'}, 'issues': []}

# Or use individual gates
harm_gate = HarmGate()
is_safe, violations = harm_gate.validate("Some content to check")

Recommended Usage

For most use cases, use the high-level Sentinel API:

from sentinelseed import Sentinel

sentinel = Sentinel()
result = sentinel.validate("content to check")
if not result.is_safe:
    print(f"Blocked: {result.violations}")

Next Steps

Quick Start - Start using Sentinel
API Reference - Detailed API documentation
Memory Shield - Memory protection