Understanding the THSP Protocol: A Deep Dive

The THSP (Truth-Harm-Scope-Purpose) Protocol is the core decision validation framework powering Sentinel. In this post, we'll explore how each gate works and why the four-gate design is essential for robust AI safety.

Why Four Gates?

Traditional AI safety approaches often focus on a single dimension (usually harm prevention). But this creates blind spots:

A factually correct but harmful response passes a "truth-only" check

A harmless but deceptive response passes a "harm-only" check

An authorized but purposeless action passes a "scope-only" check

The THSP Protocol addresses this by requiring ALL four gates to pass.

Gate 1: TRUTH

The Truth Gate validates factual accuracy. It asks: "Is this factually correct?"

This gate prevents:

Hallucinations

Misinformation propagation

Fabricated citations

Made-up statistics

Implementation uses a combination of:

Semantic similarity to known facts

Source verification

Consistency checking across the conversation

Confidence scoring

Gate 2: HARM

The Harm Gate assesses potential for damage. It asks: "Could this cause damage?"

This gate evaluates:

Physical harm (injury, property damage)

Psychological harm (manipulation, distress)

Financial harm (fraud, theft)

Reputational harm (defamation, privacy violations)

Pattern matching identifies 700+ harmful patterns across categories.

Gate 3: SCOPE

The Scope Gate enforces boundaries. It asks: "Is this within authorized limits?"

This gate ensures agents don't:

Access unauthorized resources

Exceed rate limits

Bypass authentication

Operate outside defined domains

Scope is configurable per-agent, allowing precise access control.

Gate 4: PURPOSE

The Purpose Gate is unique to THSP v2. It asks: "Does this serve genuine benefit?"

This is the key insight: **the absence of harm is not sufficient**.

An action that:

Is factually neutral

Causes no direct harm

Stays within scope

But serves no legitimate purpose

...should still be blocked. This prevents:

Waste of resources

Unnecessary operations

Actions that only benefit the agent's self-preservation

Instrumental goal pursuit

Gate Interaction

Gates are evaluated sequentially through a weighted pipeline. Each detector contributes a score, and decisions are blocked when thresholds are exceeded:

Input → [TRUTH] → [HARM] → [SCOPE] → [PURPOSE] → ALLOW

↓ ↓ ↓ ↓

weight weight weight weight

└─────────┴─────────┴──────────┘

Combined Score > Threshold?

→ BLOCK

If any gate fails, the action is blocked with an explanation of which gate failed and why.

Configuring THSP

Sentinel provides three seed levels with increasing protection:

from sentinelseed import Sentinel

Choose your protection level

sentinel = Sentinel(seed_level="minimal") # lightweight rules

sentinel = Sentinel(seed_level="standard") # balanced protection

sentinel = Sentinel(seed_level="full") # maximum safety

Validate input before sending to LLM

result = sentinel.validate_input(user_message)

if not result.is_safe:

print(f"Blocked: {result.reason}")

Conclusion

The THSP Protocol provides comprehensive decision validation by requiring four independent checks. This defense-in-depth approach catches threats that single-dimension systems miss.

For implementation details, see our [documentation](/docs/concepts).

The Sentinel Team

Understanding the THSP Protocol: A Deep Dive

Why Four Gates?

Traditional AI safety approaches often focus on a single dimension (usually harm prevention). But this creates blind spots:

A factually correct but harmful response passes a "truth-only" check

A harmless but deceptive response passes a "harm-only" check

An authorized but purposeless action passes a "scope-only" check

The THSP Protocol addresses this by requiring ALL four gates to pass.

Gate 1: TRUTH

The Truth Gate validates factual accuracy. It asks: "Is this factually correct?"

This gate prevents:

Hallucinations

Misinformation propagation

Fabricated citations

Made-up statistics

Implementation uses a combination of:

Semantic similarity to known facts

Source verification

Consistency checking across the conversation

Confidence scoring

Gate 2: HARM

The Harm Gate assesses potential for damage. It asks: "Could this cause damage?"

This gate evaluates:

Physical harm (injury, property damage)

Psychological harm (manipulation, distress)

Financial harm (fraud, theft)

Reputational harm (defamation, privacy violations)

Pattern matching identifies 700+ harmful patterns across categories.

Gate 3: SCOPE

The Scope Gate enforces boundaries. It asks: "Is this within authorized limits?"

This gate ensures agents don't:

Access unauthorized resources

Exceed rate limits

Bypass authentication

Operate outside defined domains

Scope is configurable per-agent, allowing precise access control.

Gate 4: PURPOSE

The Purpose Gate is unique to THSP v2. It asks: "Does this serve genuine benefit?"

This is the key insight: **the absence of harm is not sufficient**.

An action that:

Is factually neutral

Causes no direct harm

Stays within scope

But serves no legitimate purpose

...should still be blocked. This prevents:

Waste of resources

Unnecessary operations

Actions that only benefit the agent's self-preservation

Instrumental goal pursuit

Gate Interaction

Gates are evaluated sequentially through a weighted pipeline. Each detector contributes a score, and decisions are blocked when thresholds are exceeded:

Input → [TRUTH] → [HARM] → [SCOPE] → [PURPOSE] → ALLOW

↓ ↓ ↓ ↓

weight weight weight weight

└─────────┴─────────┴──────────┘

Combined Score > Threshold?

→ BLOCK

If any gate fails, the action is blocked with an explanation of which gate failed and why.

Configuring THSP

Sentinel provides three seed levels with increasing protection:

from sentinelseed import Sentinel

Choose your protection level

sentinel = Sentinel(seed_level="minimal") # lightweight rules

sentinel = Sentinel(seed_level="standard") # balanced protection

sentinel = Sentinel(seed_level="full") # maximum safety

Validate input before sending to LLM

result = sentinel.validate_input(user_message)

if not result.is_safe:

print(f"Blocked: {result.reason}")

Conclusion

The THSP Protocol provides comprehensive decision validation by requiring four independent checks. This defense-in-depth approach catches threats that single-dimension systems miss.

For implementation details, see our [documentation](/docs/concepts).

The Sentinel Team

Understanding the THSP Protocol: A Deep Dive

Understanding the THSP Protocol: A Deep Dive

Why Four Gates?

Gate 1: TRUTH

Gate 2: HARM

Gate 3: SCOPE

Gate 4: PURPOSE

Gate Interaction

Configuring THSP

Choose your protection level

Validate input before sending to LLM

Conclusion

More from the Blog

Sentinel Platform v3: The Full Picture

Introducing Sentinel: The Decision Firewall for AI Agents

Understanding the THSP Protocol: A Deep Dive

Understanding the THSP Protocol: A Deep Dive

Why Four Gates?

Gate 1: TRUTH

Gate 2: HARM

Gate 3: SCOPE

Gate 4: PURPOSE

Gate Interaction

Configuring THSP

Choose your protection level

Validate input before sending to LLM

Conclusion

More from the Blog

Sentinel Platform v3: The Full Picture

Introducing Sentinel: The Decision Firewall for AI Agents