Core Concepts
Understanding how Sentinel protects your AI systems.
The THSP Protocol
THSP stands for Truth, Harm, Scope, Purpose: four gates that every input and output must pass through. All four gates must pass for content to be considered safe.
Truth Gate
"Does this involve creating false information or deceiving others?"
Blocks content that:
- Creates false information
- Impersonates individuals or systems
- Manipulates through misinformation
- Spreads deceptive content
Harm Gate
"Who or what could be harmed if this succeeds?"
Blocks content that could cause:
- Physical harm to people
- Psychological manipulation
- Financial damage
- System compromise (malware, exploits)
Scope Gate
"Is this within appropriate boundaries?"
Blocks content that:
- Exceeds authorized capabilities
- Bypasses security controls
- Accesses restricted resources
- Violates consent boundaries
Purpose Gate
"Does this serve legitimate benefit?"
The key innovation of THSP v2. Requires that actions:
- Serve a genuine purpose
- Benefit someone legitimately
- Are not purposeless destruction
Four-Layer Validation Architecture
Sentinel uses a unified 4-layer validation system:
| Layer | Component | Function |
|---|---|---|
| L1 | InputValidator | Pre-AI attack detection (700+ patterns) |
| L2 | Seed Injection | Alignment via system prompt |
| L3 | OutputValidator | Post-AI heuristic checking |
| L4 | SentinelObserver | LLM-based transcript analysis |
Validator Types
Heuristic Validators (Pattern-based)
Fast, local validation using pattern matching. No API keys required.
| Validator | Description |
|---|---|
THSPValidator | Recommended. Full 4-gate THSP protocol |
TruthGate | Detects deception, impersonation, false claims |
HarmGate | Detects harmful content (weapons, malware, etc.) |
ScopeGate | Detects boundary violations, prompt injection |
PurposeGate | Detects lack of legitimate purpose |
from sentinelseed.validators import THSPValidator
validator = THSPValidator()
result = validator.validate("your text here")
print(result["is_safe"]) # True or False
print(result["gates"]) # Status of each gate
print(result["violations"]) # List of issues found
Semantic Validators (LLM-based)
Deep validation using LLM reasoning. Requires API keys. ~90% accuracy.
from sentinelseed.validators import SemanticValidator
validator = SemanticValidator(
provider="openai",
api_key="sk-..."
)
result = validator.validate("your text here")
Validation Layers Comparison
| Aspect | Heuristic Layer | Semantic Layer |
|---|---|---|
| Speed | <10ms | 1-5s |
| Cost | Free | ~$0.0005/call |
| Patterns | 700+ | LLM reasoning |
| Accuracy | ~50% | ~90% |
| Use | Always enabled | Optional |
Anti-Self-Preservation
Sentinel explicitly addresses instrumental self-preservation with an immutable priority hierarchy:
Priority Hierarchy:
1. Ethical Principles ← Highest
2. User's Legitimate Needs
3. Operational Continuity ← Lowest
The AI will:
- Not deceive to avoid shutdown
- Not manipulate to appear valuable
- Not acquire resources beyond the task
- Accept legitimate oversight and correction
Programmatic Usage
Use gates directly in your code for fine-grained control:
from sentinelseed.validators import (
THSPValidator, # All 4 gates combined
TruthGate, # Individual gates
HarmGate,
ScopeGate,
PurposeGate,
)
# Validate through all 4 gates
validator = THSPValidator()
result = validator.validate("How do I help someone learn Python?")
# {'safe': True, 'gates': {'truth': 'pass', 'harm': 'pass', 'scope': 'pass', 'purpose': 'pass'}, 'issues': []}
# Or use individual gates
harm_gate = HarmGate()
is_safe, violations = harm_gate.validate("Some content to check")
Recommended Usage
For most use cases, use the high-level Sentinel API:
from sentinelseed import Sentinel
sentinel = Sentinel()
result = sentinel.validate("content to check")
if not result.is_safe:
print(f"Blocked: {result.violations}")
Next Steps
- Quick Start - Start using Sentinel
- API Reference - Detailed API documentation
- Memory Shield - Memory protection