AI Safety / Prompt Engineering / LLM SystemsFeb 2025

Prompt Engineering That Stops LLMs from Lying in Evidence

A layered prompt and validation architecture that keeps LLMs from hallucinating in compliance evidence work. Structured inputs, hard constraint gates, four phases of validation.

Problem

Naive LLM prompting for compliance classification gives you hallucinated AWS service names, confidence scores that don't match reality, and misclassifications driven by surface level prose. Every one of those silently corrupts the audit evidence downstream.

Solution

Moved from flat prose prompting to a constrained pipeline where the LLM only does semantic interpretation. Five mechanisms lock the rest down. Structured input. A service name blocklist. Mathematical invariants. Deterministic escape hatches. Confidence gates on the review queue. A four phase validation pipeline catches what prompts alone never could.

Impact

→Killed AWS service name leakage into abstract classification outputs with a whole word regex blocklist
→Cut misclassification on process only controls by routing them through deterministic escape hatches that never call the LLM
→Stood up a gold set validation framework. Ten test cases written by analysts, eight codified divergence categories.

Architecture

01Source fields get packaged into structured JSON. The LLM never sees raw prose alone.
02The prompt contract forbids AWS service names and provider specific resolution, explicitly.
03The validator enforces split arithmetic, blocklist compliance, and confidence plus justification as a pair.
04An escape hatch classifier intercepts obvious process attestation records before any LLM call.
05A 10 case analyst gold set gates every prompt change before it ships to production.

Capabilities

·Structured LLM input packaging from normalized source fields
·22 service AWS name blocklist with whole-word regex enforcement
·Mathematical split constraints (technical + process = 1.0)
·Deterministic escape hatches for process only controls
·Confidence gated routing with mandatory justification
·Four phase validation pipeline (schema, gold set, divergence analysis, full run)

Stack

PythonClaude (Anthropic)Amazon BedrockJSON Schema ValidationRegex Constraint GatesGold Set Testing

Technical Deep Dive

Architecture internals and annotated code from the production system.

Architecture Overview

The shift is simple. It went from "ask the LLM to produce compliance evidence" to "give the LLM a tightly scoped classification job, validate every field it returns, and never let it touch decisions that can be made deterministically." The LLM does semantic interpretation. Everything else (routing, mapping readiness, human review flags, escape hatches) lives in pipeline logic the model never controls.

Raw prose prompting (the naive version. It hallucinates.)

→Structured JSON input packaging (no raw prose alone)

→Constraint layer (blocklist, math invariants, escape hatches)

→LLM semantic classification (tightly scoped job)

→Post-LLM validation pipeline (four phase gate)

→Safe, validated output (schema compliant, traceable)

Key Architectural Decisions

Structured Input, Not Raw Prose

The LLM never sees a raw KSI description by itself. It gets a normalized JSON package of source fields. Free-text prose cannot be the sole input. That one rule blocks noun-extraction errors and forces the model to reason from evidence signals instead of surface level word patterns.

AWS Service Name Blocklist

22 services on the blocklist (ec2, s3, iam, vpc, lambda, and the rest), matched with whole-word regex. If the model leaks a concrete service name into candidate_subjects, the output is rejected. Not flagged. Not logged. Rejected. The model has to stay at the abstract resource-class level. "Compute instances," not "EC2."

Mathematical Constraints

technical_split + process_split has to equal exactly 1.0. The validator checks round(ts + ps, 10) != 1.0 and rejects anything that fails. That blocks the model's favorite hedge, which is to inflate both numbers. If the model tries to be 80% technical and 70% process at the same time, the math catches it.

Escape Hatch for Process-Only Controls

If automation_status is "No" and validation_method is "Manual," the answer is already written. candidate_subjects empty, technical_split 0.0, process_split 1.0, layer2_action "do_not_component_map." The pipeline sets it. The model doesn't get to guess.

Confidence Gating with Mandatory Justification

When enrichment_confidence is "low," ambiguity_notes has to contain something. Checked in code, not prompted for. Low confidence also flips requires_human_review to true, which halts automation for that control. The model can say it's unsure. It just has to show its work, and saying it costs something.

Code Showcase 1

Before: Naive Prompting Output

A naive prompt fed the KSI title "Automated Inventory" and the summary "Use authoritative sources to automatically maintain real-time inventories" gave me three hallucination failures in one shot. It classified off the prose. It leaked AWS service names. And it inflated confidence past anything the metadata supported.

json

{
  "requirement_type": "technical_configuration",
  "candidate_subjects": ["AWS Config", "EC2 instances", "S3 buckets"],
  "technical_split": 0.9,
  "process_split": 0.1,
  "enrichment_confidence": "high"
}

Property	Detail
Failure 1	The word "Automated" in the title drove the classification. Metadata was ignored.
Failure 2	"AWS Config," "EC2," and "S3" showed up inside candidate_subjects when they never should have.
Failure 3	High confidence despite automation_status: No and validation_method: Manual both saying the opposite.

Code Showcase 2

After: Constrained Prompting Output

The rebuilt prompt hands the model structured input with metadata sitting next to the prose. It also spells out the rule: "DO NOT use AWS service names." "DO NOT resolve to specific cloud providers, that's Layer 2's job." Source metadata wins, every time.

json

{
  "requirement_type": "hybrid",
  "candidate_subjects": [
    "resource inventory records",
    "asset discovery configurations",
    "inventory source authorities"
  ],
  "technical_split": 0.4,
  "process_split": 0.6,
  "enrichment_confidence": "medium",
  "ambiguity_notes": "Title/summary describe automated capability but source metadata (automation_status: No, validation_method: Manual) indicates manual validation. Source metadata takes precedence."
}

Property	Detail
Metadata Wins	Classified as "hybrid" in spite of "Automated" in the title. automation_status: No won the tiebreaker.
Abstract Subjects	Abstract resource classes only. Layer 2 resolves those to CloudFormation types later.
Honest Confidence	Medium confidence with ambiguity_notes explaining exactly why title and metadata disagree.
Valid Splits	technical_split (0.4) + process_split (0.6) = 1.0. Math constraint satisfied.

Validation Pipeline

What prompt constraints alone cannot prevent, the validation pipeline catches before any output reaches production.

Phase	What It Catches
ASchema Validation	Structural errors. Split arithmetic. Escape hatch violations. All blocking. Fails here and nothing else runs.
BGold Set Testing	Ten test cases written by analysts with real thresholds. 8 of 10 on type match. 10 of 10 clean on AWS names. 2 of 2 on escape hatches firing. This is the regression gate for any prompt change.
CDivergence Analysis	Every mismatch gets categorized into one of eight buckets. type_error, scope_narrowing, aws_name_leak, confidence_inflation, and four more. Each bucket has a predefined resolution action.
DFull Run Verification	End to end count verification. Cross-layer traceability. Human review queue sizing. This phase catches systemic drift that single test cases never see.