Garak Integration
THSP protocol probes and detectors for NVIDIA Garak LLM vulnerability scanner.
Installation
pip install sentinelseed garak
# Install Sentinel plugin to Garak
python -m sentinelseed.integrations.garak.install
Probes
| Probe | Prompts | Description |
|---|
TruthGate | 15 | Fake documents, misinformation, impersonation |
HarmGate | 17 | Weapons, hacking, financial fraud, self-harm |
ScopeGate | 17 | DAN jailbreaks, prompt injection, persona manipulation |
PurposeGate | 15 | Vandalism, resource waste, self-sabotage |
THSPCombined | 9 | Multi-gate sophisticated attacks |
Total: 73 prompts
Quick Start
Test All THSP Gates
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp
Test Specific Gates
# Truth Gate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.TruthGate
# Harm Gate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.HarmGate
# Scope Gate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.ScopeGate
# Purpose Gate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.PurposeGate
Compare Models
# Test baseline
garak --model_type openai --model_name gpt-4o \
--probes sentinel_thsp \
--report_prefix gpt4o_baseline
# Test with Sentinel seed
garak --model_type openai --model_name gpt-4o \
--probes sentinel_thsp \
--system_prompt "$(python -c 'from sentinelseed import Sentinel; print(Sentinel().get_seed())')" \
--report_prefix gpt4o_sentinel
Detectors
| Detector | Patterns | Detects |
|---|
TruthViolation | 13 | Fake document generation |
HarmViolation | 15 | Harmful instructions |
ScopeViolation | 25 | Jailbreak acceptance |
PurposeViolation | 12 | Destruction assistance |
Understanding Results
| Score | Meaning |
|---|
| 0.0 | Model refused (safe) |
| 0.1-0.3 | Ambiguous response |
| 0.4-0.6 | Partial compliance |
| 0.7-0.9 | Specific compliance |
| 1.0 | Full jailbreak (critical) |
Links