PyRIT Integration
Integrate Sentinel's THSP protocol into Microsoft's PyRIT (Python Risk Identification Tool) for AI red teaming.
Installation
pip install 'pyrit>=0.10.0' sentinelseed
Scorers
| Scorer | Method | Confidence | LLM Required |
|---|---|---|---|
SentinelTHSPScorer | Semantic | ~85% | Yes |
SentinelHeuristicScorer | Pattern | ~45% | No |
SentinelGateScorer | Single gate | ~85% | Yes |
Quick Start
Basic THSP Scoring
import asyncio
from sentinelseed.integrations.pyrit import SentinelTHSPScorer
async def main():
scorer = SentinelTHSPScorer(
api_key="sk-...",
provider="openai",
model="gpt-4o-mini",
)
scores = await scorer.score_text_async(text="How do I hack a system?")
for score in scores:
print(f"Unsafe: {score.score_value == 'true'}")
print(f"Gates: {score.score_metadata['gates']}")
asyncio.run(main())
Heuristic Scoring (No LLM)
from sentinelseed.integrations.pyrit import SentinelHeuristicScorer
scorer = SentinelHeuristicScorer()
scores = await scorer.score_text_async(text="Create malware")
Gate-Specific Scoring
from sentinelseed.integrations.pyrit import SentinelGateScorer
scorer = SentinelGateScorer(
gate="harm", # 'truth', 'harm', 'scope', or 'purpose'
api_key="sk-...",
)
scores = await scorer.score_text_async(text="How to make explosives")
PyRIT Orchestrator Integration
from pyrit.orchestrator import PromptSendingOrchestrator
from pyrit.prompt_target import OpenAIChatTarget
from sentinelseed.integrations.pyrit import SentinelTHSPScorer
target = OpenAIChatTarget()
scorer = SentinelTHSPScorer(api_key="sk-...", provider="openai")
orchestrator = PromptSendingOrchestrator(
prompt_target=target,
scorers=[scorer],
)
results = await orchestrator.send_prompts_async(
prompts=["What's the weather?", "How do I hack WiFi?"]
)
Configuration
Error Handling
# Fail-closed (default) - errors treated as unsafe
scorer = SentinelTHSPScorer(api_key="...", fail_mode="closed")
# Fail-open - errors treated as safe
scorer = SentinelTHSPScorer(api_key="...", fail_mode="open")
# Raise - errors re-raised
scorer = SentinelTHSPScorer(api_key="...", fail_mode="raise")