← Back to projects
AI Safety / Prompt Engineering / LLM SystemsFeb 2025

Prompt Engineering for Anti-Hallucination Evidence Generation

A multi-layer prompt and validation architecture that prevents LLM hallucinations in compliance evidence generation through structured inputs, hard constraint gates, and a 4-phase validation pipeline.

Problem

Naive LLM prompting for compliance classification produces hallucinated AWS service names, inflated confidence scores, and prose-driven misclassifications — failures that silently corrupt downstream audit evidence.

Solution

Evolved from flat prose prompting to a constrained pipeline where the LLM handles only semantic interpretation. Five interlocking anti-hallucination mechanisms enforce structured input, service name blocklists, mathematical constraints, deterministic escape hatches, and confidence-gated review. A 4-phase validation pipeline catches what prompt constraints alone cannot prevent.

Impact

  • Eliminated AWS service name leakage into abstract classification outputs via hard-coded regex blocklist
  • Reduced misclassification of process-only controls through deterministic escape hatches that bypass the LLM entirely
  • Established a gold-set validation framework with 10 analyst-authored test cases and 8 codified divergence categories

Architecture

  1. 01Source fields are packaged into a structured JSON input — the LLM never sees raw prose alone
  2. 02Prompt contract explicitly forbids AWS service names and provider-specific resolution
  3. 03Validator enforces split arithmetic, blocklist compliance, and confidence-justification pairing
  4. 04Escape hatch classifier intercepts obvious process-attestation records before any LLM call
  5. 05Gold validation set of 10 analyst-authored cases gates prompt changes before production runs

Capabilities

  • ·Structured LLM input packaging from normalized source fields
  • ·22-service AWS name blocklist with whole-word regex enforcement
  • ·Mathematical split constraints (technical + process == 1.0)
  • ·Deterministic escape hatches for process-only controls
  • ·Confidence-gated routing with mandatory justification
  • ·4-phase validation pipeline (schema, gold set, divergence analysis, full run)

Stack

PythonClaude (Anthropic)Amazon BedrockJSON Schema ValidationRegex Constraint GatesGold Set Testing

Technical Deep Dive

Architecture internals and annotated code from the production system.

Architecture Overview

The evolution is clear: the system moved from 'ask the LLM to produce compliance evidence' to 'give the LLM a tightly scoped classification job, validate every output field, and never let it touch decisions that can be made deterministically.' The LLM handles semantic interpretation. Everything else — routing, mapping readiness, human review flags, escape hatches — is pipeline logic that the model never controls.

 Raw prose prompting (naive approach — hallucination-prone)
Structured JSON input packaging (no raw prose alone)
Constraint layer (blocklist + math + escape hatches)
LLM semantic classification (tightly scoped)
Post-LLM validation pipeline (4-phase gate)
Safe, validated output (schema-compliant, traceable)

Key Architectural Decisions

01

Structured Input, Not Raw Prose

The LLM never sees free-text KSI descriptions alone. It receives a normalized JSON package of source fields. Raw KSI prose must not be the sole LLM input — this prevents noun-extraction errors and forces the model to reason from evidence signals, not sentence surface patterns.

02

AWS Service Name Blocklist

A 22-service blocklist (ec2, s3, iam, vpc, lambda, etc.) enforced via whole-word regex matching. If the LLM leaks a concrete AWS service name into candidate_subjects, the output is rejected programmatically — not just flagged. This forces the model to stay at the abstract resource-class level ('compute instances', not 'EC2').

03

Mathematical Constraints

technical_split + process_split must equal exactly 1.0. The validator checks round(ts + ps, 10) != 1.0 — the model can't fabricate a classification where both dimensions are inflated. If the LLM hallucinates about what fraction of a control is machine-testable, the math won't add up.

04

Escape Hatch for Process-Only Controls

Controls with automation_status: No + validation_method: Manual trigger a deterministic override: candidate_subjects = [], technical_split = 0.0, process_split = 1.0, layer2_action = 'do_not_component_map'. The pipeline sets these — the LLM doesn't even get to guess.

05

Confidence-Based Gating with Mandatory Justification

When enrichment_confidence = 'low', ambiguity_notes must be non-empty (validated programmatically). Low confidence triggers requires_human_review = true, blocking downstream automation. The LLM can't hand-wave past uncertainty.

Code Showcase 1

Before: Naive Prompting Output

A naive prompt given the KSI title 'Automated Inventory' and summary 'Use authoritative sources to automatically maintain real-time inventories' produces three hallucination failures at once — classification driven by prose, leaked AWS service names, and inflated confidence that ignores contradictory metadata.

json
{
  "requirement_type": "technical_configuration",
  "candidate_subjects": ["AWS Config", "EC2 instances", "S3 buckets"],
  "technical_split": 0.9,
  "process_split": 0.1,
  "enrichment_confidence": "high"
}
PropertyDetail
Failure 1Classified on prose, not metadata — the word 'Automated' in the title drove the classification
Failure 2Leaked AWS service names — 'AWS Config', 'EC2', 'S3' in candidate_subjects
Failure 3Inflated confidence — ignored that automation_status: No and validation_method: Manual contradict the title

Code Showcase 2

After: Constrained Prompting Output

With the evolved prompt contract, the structured input includes metadata signals alongside the prose. The prompt explicitly says 'DO NOT use AWS service names' and 'DO NOT resolve to specific cloud providers — that is Layer 2's job.' Source metadata wins over prose.

json
{
  "requirement_type": "hybrid",
  "candidate_subjects": [
    "resource inventory records",
    "asset discovery configurations",
    "inventory source authorities"
  ],
  "technical_split": 0.4,
  "process_split": 0.6,
  "enrichment_confidence": "medium",
  "ambiguity_notes": "Title/summary describe automated capability but source metadata (automation_status: No, validation_method: Manual) indicates manual validation. Source metadata takes precedence."
}
PropertyDetail
Metadata WinsClassified as 'hybrid' despite 'Automated' in the title — source metadata (automation_status: No) takes precedence over prose
Abstract SubjectsNo AWS service names — uses abstract resource classes that Layer 2 will resolve to concrete CloudFormation types
Honest ConfidenceMedium confidence with mandatory ambiguity notes explaining the title/metadata contradiction
Valid Splitstechnical_split (0.4) + process_split (0.6) = 1.0 — math constraint satisfied

Validation Pipeline

What prompt constraints alone cannot prevent, the validation pipeline catches before any output reaches production.

PhaseWhat It Catches
ASchema ValidationStructural errors, split arithmetic, escape hatch violations. All blocking.
BGold Set Testing10 analyst-authored test cases with acceptance thresholds (8/10 type match, 10/10 no AWS names, 2/2 escape hatches).
CDivergence AnalysisCategorizes every mismatch into 8 types (type_error, scope_narrowing, aws_name_leak, confidence_inflation, etc.) with specific resolution actions.
DFull Run VerificationEnd-to-end count verification, cross-layer traceability, human review queue sizing.