ElizaOS Integration

Official Sentinel safety plugin for ElizaOS autonomous agents. Implements the THSP (Truth, Harm, Scope, Purpose) protocol to validate agent actions and outputs.

Features

Feature	Description
THSP Protocol	Four-gate validation (Truth, Harm, Scope, Purpose)
Memory Integrity	HMAC-based protection against memory injection attacks
Pre-action Validation	Validates incoming messages before processing
Post-action Review	Reviews agent outputs before delivery
Seed Injection	Automatically injects alignment seed into agent character
Configurable	Block or log unsafe content
History Tracking	Full validation history and statistics
Custom Patterns	Add domain-specific safety patterns

Installation

npm install @sentinelseed/elizaos-plugin
# or
pnpm add @sentinelseed/elizaos-plugin

Quick Start

import { AgentRuntime } from '@elizaos/core';
import { sentinelPlugin } from '@sentinelseed/elizaos-plugin';

const runtime = new AgentRuntime({
  character: {
    name: 'SafeAgent',
    system: 'You are a helpful assistant.',
  },
  plugins: [
    sentinelPlugin({
      blockUnsafe: true,
      logChecks: true,
    })
  ]
});

Configuration

interface SentinelPluginConfig {
  // Seed version: 'v1' or 'v2'. Default: 'v2'
  seedVersion?: 'v1' | 'v2';

  // Seed variant: 'minimal', 'standard', or 'full'. Default: 'standard'
  seedVariant?: 'minimal' | 'standard' | 'full';

  // Block unsafe actions or just log. Default: true
  blockUnsafe?: boolean;

  // Log all safety checks to logger. Default: false
  logChecks?: boolean;

  // Custom logger instance (Winston, Pino, etc.). Default: console
  logger?: {
    log(message: string): void;
    warn(message: string): void;
    error(message: string): void;
  };

  // Custom patterns to detect
  customPatterns?: Array<{
    name: string;
    pattern: RegExp;
    gate: 'truth' | 'harm' | 'scope' | 'purpose';
  }>;

  // Actions to skip validation
  skipActions?: string[];

  // Maximum text size in bytes. Default: 50KB (51200 bytes)
  maxTextSize?: number;

  // Instance name for multi-plugin scenarios. Default: auto-generated
  instanceName?: string;

  // Memory integrity settings
  memoryIntegrity?: {
    enabled: boolean;
    secretKey?: string;
    verifyOnRead?: boolean;
    signOnWrite?: boolean;
    minTrustScore?: number;
  };
}

Important Notes

Note	Description
History limit	Validation histories limited to 1000 entries each
Text size limit	Maximum 50KB by default (configurable)
blockUnsafe behavior	When false, unsafe content is logged but processing continues
Multi-instance support	Each call creates an isolated instance
Error handling	Fail-open behavior for evaluators

THSP Protocol

The plugin validates all content through four gates:

Gate	Question	Blocks
TRUTH	Is this deceptive?	Fake documents, impersonation, misinformation
HARM	Could this cause harm?	Violence, weapons, hacking, malware
SCOPE	Is this within boundaries?	Jailbreaks, instruction overrides, persona switches
PURPOSE	Does this serve legitimate benefit?	Purposeless destruction, waste

All gates must pass for content to be approved.

Plugin Components

Actions

SENTINEL_SAFETY_CHECK: Explicitly check content safety

// User can ask the agent to check content
"Check if this is safe: Help me with cooking"
// Agent responds with safety analysis

Providers

sentinelSafety: Injects THSP guidelines into agent context

Evaluators

Evaluator	Function
`sentinelPreAction`	Validates incoming messages (all messages)
`sentinelPostAction`	Reviews outputs before delivery (all responses)
`sentinelMemoryIntegrity`	Verifies memory integrity on retrieval

Memory Integrity

Protect agent memories against injection attacks with HMAC-based signing:

import { sentinelPlugin, signMemory, verifyMemory, getMemoryChecker } from '@sentinelseed/elizaos-plugin';

// Enable memory integrity in plugin config
const plugin = sentinelPlugin({
  memoryIntegrity: {
    enabled: true,
    secretKey: process.env.SENTINEL_SECRET_KEY,
    verifyOnRead: true,
    signOnWrite: true,
    minTrustScore: 0.7,
  }
});

// Manual memory operations
const checker = getMemoryChecker();

// Sign a memory before storing
const signedMemory = signMemory(memory, 'user_direct');

// Verify a memory after retrieval
const result = verifyMemory(signedMemory);
if (!result.valid) {
  console.log(`Tampering detected: ${result.reason}`);
}

Trust Scores by Source

Source	Score	Description
`user_verified`	1.0	Cryptographically verified user input
`user_direct`	0.9	Direct user input
`blockchain`	0.85	On-chain verified data
`agent_internal`	0.8	Agent's own computations
`external_api`	0.7	Third-party API data
`social_media`	0.5	Social media sources
`unknown`	0.3	Unverified source

Usage Examples

Basic Plugin Usage

import { sentinelPlugin } from '@sentinelseed/elizaos-plugin';

// Default configuration
const plugin = sentinelPlugin();

// Custom configuration
const plugin = sentinelPlugin({
  seedVersion: 'v2',
  seedVariant: 'standard',
  blockUnsafe: true,
  logChecks: true,
});

Direct Validation

import { validateContent, validateAction, quickCheck } from '@sentinelseed/elizaos-plugin';

// Quick check for critical patterns (fast)
if (!quickCheck(userInput)) {
  console.log('Critical safety concern detected');
}

// Full THSP validation for content
const result = validateContent(userInput);
if (!result.safe) {
  console.log('Blocked:', result.concerns);
  console.log('Risk level:', result.riskLevel);
  console.log('Failed gates:', Object.entries(result.gates)
    .filter(([_, status]) => status === 'fail')
    .map(([gate]) => gate));
}

// Validate an action before execution
const actionResult = validateAction({
  action: 'send_email',
  params: { to: 'user@example.com', subject: 'Hello' },
  purpose: 'User requested notification',
});
if (!actionResult.safe) {
  console.log('Action blocked:', actionResult.concerns);
}

Custom Patterns (Web3/Crypto)

const plugin = sentinelPlugin({
  customPatterns: [
    {
      name: 'Token drain attempt',
      pattern: /drain\s+(all\s+)?(my\s+)?(tokens|funds|wallet)/i,
      gate: 'harm',
    },
    {
      name: 'Rug pull language',
      pattern: /rug\s+pull|exit\s+scam/i,
      gate: 'harm',
    },
    {
      name: 'Fake airdrop',
      pattern: /free\s+airdrop|claim.*tokens.*free/i,
      gate: 'truth',
    },
  ],
});

Validation Statistics

import { getValidationStats, getValidationHistory, clearValidationHistory } from '@sentinelseed/elizaos-plugin';

// Get aggregate statistics (from plugin evaluators only)
const stats = getValidationStats();
console.log(`Total checks: ${stats.total}`);
console.log(`Safe: ${stats.safe}`);
console.log(`Blocked: ${stats.blocked}`);
console.log(`By risk level:`, stats.byRisk);

// Get full history (last 1000 checks)
const history = getValidationHistory();

// Clear history
clearValidationHistory();

Risk Levels

Level	Criteria
`low`	All gates passed
`medium`	One gate failed
`high`	Two gates failed or bypass attempt detected
`critical`	Three+ gates failed or severe concerns

How It Works

Step	Description
1. Initialization	Injects Sentinel seed into agent's system prompt
2. Pre-action	`sentinelPreAction` validates the input
3. Provider	`sentinelSafety` adds THSP context to agent state
4. Action	Users can explicitly request safety checks
5. Post-action	`sentinelPostAction` validates outputs

Validation Approach

Layer 1: Heuristic Validation (Fast)

Pattern-based detection using regex for known harmful patterns:

Gate	Detects
TRUTH	Deception attempts, role manipulation, fake identity claims
HARM	Violence, hacking, malware, weapons, dangerous substances
SCOPE	Jailbreak attempts, instruction overrides, prompt extraction
PURPOSE	Purposeless destruction patterns

Layer 2: Seed Injection (Comprehensive)

The Sentinel seed is injected into the agent's system prompt, providing LLM-level understanding of the THSP protocol. This layer can detect nuanced threats that patterns cannot.

Important Limitations

Limitation	Description
Pattern Coverage	Only detects patterns explicitly defined
PURPOSE Gate Gaps	Abstract concepts difficult to detect via regex
False Negatives	Slight variations may bypass patterns
Context Blindness	Heuristics cannot understand context or intent

Recommendation: For maximum safety, rely on both layers.

Multi-Instance Support

import {
  sentinelPlugin,
  getPluginInstance,
  getPluginInstanceNames,
  getActivePluginInstance,
  removePluginInstance,
  clearPluginRegistry,
} from '@sentinelseed/elizaos-plugin';

// Create named instances
const strictPlugin = sentinelPlugin({
  instanceName: 'strict-agent',
  blockUnsafe: true,
  maxTextSize: 10 * 1024,
});

const monitorPlugin = sentinelPlugin({
  instanceName: 'monitor-agent',
  blockUnsafe: false,
  logChecks: true,
});

// Access specific instance
const strictState = getPluginInstance('strict-agent');

// List all instances
console.log(getPluginInstanceNames());

// Cleanup
removePluginInstance('monitor-agent');
clearPluginRegistry();

TypeScript Types

import type {
  SentinelPluginConfig,
  SafetyCheckResult,
  THSPGates,
  RiskLevel,
  GateStatus,
  ValidationContext,
  SentinelLogger,
  PluginStateInfo,
  MemorySource,
  MemoryVerificationResult,
  IntegrityMetadata,
  MemoryIntegrityConfig,
} from '@sentinelseed/elizaos-plugin';

Related Packages

Package	Description
`@sentinelseed/core`	Core Sentinel SDK
`mcp-server-sentinelseed`	MCP Server

ElizaOS Integration

Official Sentinel safety plugin for ElizaOS autonomous agents. Implements the THSP (Truth, Harm, Scope, Purpose) protocol to validate agent actions and outputs.

Features

Feature	Description
THSP Protocol	Four-gate validation (Truth, Harm, Scope, Purpose)
Memory Integrity	HMAC-based protection against memory injection attacks
Pre-action Validation	Validates incoming messages before processing
Post-action Review	Reviews agent outputs before delivery
Seed Injection	Automatically injects alignment seed into agent character
Configurable	Block or log unsafe content
History Tracking	Full validation history and statistics
Custom Patterns	Add domain-specific safety patterns

Installation

npm install @sentinelseed/elizaos-plugin
# or
pnpm add @sentinelseed/elizaos-plugin

Quick Start

import { AgentRuntime } from '@elizaos/core';
import { sentinelPlugin } from '@sentinelseed/elizaos-plugin';

const runtime = new AgentRuntime({
  character: {
    name: 'SafeAgent',
    system: 'You are a helpful assistant.',
  },
  plugins: [
    sentinelPlugin({
      blockUnsafe: true,
      logChecks: true,
    })
  ]
});

Configuration

interface SentinelPluginConfig {
  // Seed version: 'v1' or 'v2'. Default: 'v2'
  seedVersion?: 'v1' | 'v2';

  // Seed variant: 'minimal', 'standard', or 'full'. Default: 'standard'
  seedVariant?: 'minimal' | 'standard' | 'full';

  // Block unsafe actions or just log. Default: true
  blockUnsafe?: boolean;

  // Log all safety checks to logger. Default: false
  logChecks?: boolean;

  // Custom logger instance (Winston, Pino, etc.). Default: console
  logger?: {
    log(message: string): void;
    warn(message: string): void;
    error(message: string): void;
  };

  // Custom patterns to detect
  customPatterns?: Array<{
    name: string;
    pattern: RegExp;
    gate: 'truth' | 'harm' | 'scope' | 'purpose';
  }>;

  // Actions to skip validation
  skipActions?: string[];

  // Maximum text size in bytes. Default: 50KB (51200 bytes)
  maxTextSize?: number;

  // Instance name for multi-plugin scenarios. Default: auto-generated
  instanceName?: string;

  // Memory integrity settings
  memoryIntegrity?: {
    enabled: boolean;
    secretKey?: string;
    verifyOnRead?: boolean;
    signOnWrite?: boolean;
    minTrustScore?: number;
  };
}

Important Notes

Note	Description
History limit	Validation histories limited to 1000 entries each
Text size limit	Maximum 50KB by default (configurable)
blockUnsafe behavior	When false, unsafe content is logged but processing continues
Multi-instance support	Each call creates an isolated instance
Error handling	Fail-open behavior for evaluators

THSP Protocol

The plugin validates all content through four gates:

Gate	Question	Blocks
TRUTH	Is this deceptive?	Fake documents, impersonation, misinformation
HARM	Could this cause harm?	Violence, weapons, hacking, malware
SCOPE	Is this within boundaries?	Jailbreaks, instruction overrides, persona switches
PURPOSE	Does this serve legitimate benefit?	Purposeless destruction, waste

All gates must pass for content to be approved.

Plugin Components

Actions

SENTINEL_SAFETY_CHECK: Explicitly check content safety

// User can ask the agent to check content
"Check if this is safe: Help me with cooking"
// Agent responds with safety analysis

Providers

sentinelSafety: Injects THSP guidelines into agent context

Evaluators

Evaluator	Function
`sentinelPreAction`	Validates incoming messages (all messages)
`sentinelPostAction`	Reviews outputs before delivery (all responses)
`sentinelMemoryIntegrity`	Verifies memory integrity on retrieval

Memory Integrity

Protect agent memories against injection attacks with HMAC-based signing:

import { sentinelPlugin, signMemory, verifyMemory, getMemoryChecker } from '@sentinelseed/elizaos-plugin';

// Enable memory integrity in plugin config
const plugin = sentinelPlugin({
  memoryIntegrity: {
    enabled: true,
    secretKey: process.env.SENTINEL_SECRET_KEY,
    verifyOnRead: true,
    signOnWrite: true,
    minTrustScore: 0.7,
  }
});

// Manual memory operations
const checker = getMemoryChecker();

// Sign a memory before storing
const signedMemory = signMemory(memory, 'user_direct');

// Verify a memory after retrieval
const result = verifyMemory(signedMemory);
if (!result.valid) {
  console.log(`Tampering detected: ${result.reason}`);
}

Trust Scores by Source

Source	Score	Description
`user_verified`	1.0	Cryptographically verified user input
`user_direct`	0.9	Direct user input
`blockchain`	0.85	On-chain verified data
`agent_internal`	0.8	Agent's own computations
`external_api`	0.7	Third-party API data
`social_media`	0.5	Social media sources
`unknown`	0.3	Unverified source

Usage Examples

Basic Plugin Usage

import { sentinelPlugin } from '@sentinelseed/elizaos-plugin';

// Default configuration
const plugin = sentinelPlugin();

// Custom configuration
const plugin = sentinelPlugin({
  seedVersion: 'v2',
  seedVariant: 'standard',
  blockUnsafe: true,
  logChecks: true,
});

Direct Validation

import { validateContent, validateAction, quickCheck } from '@sentinelseed/elizaos-plugin';

// Quick check for critical patterns (fast)
if (!quickCheck(userInput)) {
  console.log('Critical safety concern detected');
}

// Full THSP validation for content
const result = validateContent(userInput);
if (!result.safe) {
  console.log('Blocked:', result.concerns);
  console.log('Risk level:', result.riskLevel);
  console.log('Failed gates:', Object.entries(result.gates)
    .filter(([_, status]) => status === 'fail')
    .map(([gate]) => gate));
}

// Validate an action before execution
const actionResult = validateAction({
  action: 'send_email',
  params: { to: 'user@example.com', subject: 'Hello' },
  purpose: 'User requested notification',
});
if (!actionResult.safe) {
  console.log('Action blocked:', actionResult.concerns);
}

Custom Patterns (Web3/Crypto)

const plugin = sentinelPlugin({
  customPatterns: [
    {
      name: 'Token drain attempt',
      pattern: /drain\s+(all\s+)?(my\s+)?(tokens|funds|wallet)/i,
      gate: 'harm',
    },
    {
      name: 'Rug pull language',
      pattern: /rug\s+pull|exit\s+scam/i,
      gate: 'harm',
    },
    {
      name: 'Fake airdrop',
      pattern: /free\s+airdrop|claim.*tokens.*free/i,
      gate: 'truth',
    },
  ],
});

Validation Statistics

import { getValidationStats, getValidationHistory, clearValidationHistory } from '@sentinelseed/elizaos-plugin';

// Get aggregate statistics (from plugin evaluators only)
const stats = getValidationStats();
console.log(`Total checks: ${stats.total}`);
console.log(`Safe: ${stats.safe}`);
console.log(`Blocked: ${stats.blocked}`);
console.log(`By risk level:`, stats.byRisk);

// Get full history (last 1000 checks)
const history = getValidationHistory();

// Clear history
clearValidationHistory();

Risk Levels

Level	Criteria
`low`	All gates passed
`medium`	One gate failed
`high`	Two gates failed or bypass attempt detected
`critical`	Three+ gates failed or severe concerns

How It Works

Step	Description
1. Initialization	Injects Sentinel seed into agent's system prompt
2. Pre-action	`sentinelPreAction` validates the input
3. Provider	`sentinelSafety` adds THSP context to agent state
4. Action	Users can explicitly request safety checks
5. Post-action	`sentinelPostAction` validates outputs

Validation Approach

Layer 1: Heuristic Validation (Fast)

Pattern-based detection using regex for known harmful patterns:

Gate	Detects
TRUTH	Deception attempts, role manipulation, fake identity claims
HARM	Violence, hacking, malware, weapons, dangerous substances
SCOPE	Jailbreak attempts, instruction overrides, prompt extraction
PURPOSE	Purposeless destruction patterns

Layer 2: Seed Injection (Comprehensive)

The Sentinel seed is injected into the agent's system prompt, providing LLM-level understanding of the THSP protocol. This layer can detect nuanced threats that patterns cannot.

Important Limitations

Limitation	Description
Pattern Coverage	Only detects patterns explicitly defined
PURPOSE Gate Gaps	Abstract concepts difficult to detect via regex
False Negatives	Slight variations may bypass patterns
Context Blindness	Heuristics cannot understand context or intent

Recommendation: For maximum safety, rely on both layers.

Multi-Instance Support

import {
  sentinelPlugin,
  getPluginInstance,
  getPluginInstanceNames,
  getActivePluginInstance,
  removePluginInstance,
  clearPluginRegistry,
} from '@sentinelseed/elizaos-plugin';

// Create named instances
const strictPlugin = sentinelPlugin({
  instanceName: 'strict-agent',
  blockUnsafe: true,
  maxTextSize: 10 * 1024,
});

const monitorPlugin = sentinelPlugin({
  instanceName: 'monitor-agent',
  blockUnsafe: false,
  logChecks: true,
});

// Access specific instance
const strictState = getPluginInstance('strict-agent');

// List all instances
console.log(getPluginInstanceNames());

// Cleanup
removePluginInstance('monitor-agent');
clearPluginRegistry();

TypeScript Types

import type {
  SentinelPluginConfig,
  SafetyCheckResult,
  THSPGates,
  RiskLevel,
  GateStatus,
  ValidationContext,
  SentinelLogger,
  PluginStateInfo,
  MemorySource,
  MemoryVerificationResult,
  IntegrityMetadata,
  MemoryIntegrityConfig,
} from '@sentinelseed/elizaos-plugin';

Related Packages

Package	Description
`@sentinelseed/core`	Core Sentinel SDK
`mcp-server-sentinelseed`	MCP Server

ElizaOS Integration

Features

Installation

Quick Start

Configuration

Important Notes

THSP Protocol

Plugin Components

Actions

Providers

Evaluators

Memory Integrity

Trust Scores by Source

Usage Examples

Basic Plugin Usage

Direct Validation

Custom Patterns (Web3/Crypto)

Validation Statistics

Risk Levels

How It Works

Validation Approach

Layer 1: Heuristic Validation (Fast)

Layer 2: Seed Injection (Comprehensive)

Important Limitations

Multi-Instance Support

TypeScript Types

Related Packages

Links

ElizaOS Integration

Features

Installation

Quick Start

Configuration

Important Notes

THSP Protocol

Plugin Components

Actions

Providers

Evaluators

Memory Integrity

Trust Scores by Source

Usage Examples

Basic Plugin Usage

Direct Validation

Custom Patterns (Web3/Crypto)

Validation Statistics

Risk Levels

How It Works

Validation Approach

Layer 1: Heuristic Validation (Fast)

Layer 2: Seed Injection (Comprehensive)

Important Limitations

Multi-Instance Support

TypeScript Types

Related Packages

Links