AI Driven Issue Tracking and Analytics Pipeline
An eight stage AI pipeline. It maps FedRAMP 20x controls to a client's actual tech stack, finds the gaps against live Vanta test data, writes the remediation plan, and uploads the whole Epic, Task, and Subtask hierarchy to Jira.
Problem
FedRAMP 20x introduced a new control family structure (KSI, ADS, CCM) that existing compliance tooling doesn't handle. Analyzing each control against a client's real infrastructure, cross checking Vanta test coverage, finding gaps, writing remediation tickets, uploading them to Jira. Weeks of work per control family. Then you do it again for every new client.
Solution
Built an end to end pipeline in eight sequential notebook stages. Every stage emits CSV artifacts the next stage reads. That gives you a traceable chain from raw control data all the way to the Jira tickets. GPT-5 handles the heavy reasoning: gap analysis, remediation planning, root cause grouping. GPT-4.1 Mini handles structured extraction and formatting where deep thinking doesn't help. Cheap work stays cheap, smart work stays smart.
Impact
- →Ran 3 control families (KSI with 56 controls, ADS with 20, CCM with 3) through the full pipeline end to end
- →Generated 582 Jira tickets (Epics, Tasks, Subtasks) across every family with proper hierarchy and audit ready descriptions
- →Compressed the control to ticket lifecycle from weeks of manual analysis down to a single pipeline run per family
- →Every AI call is logged with the prompt sent and the response received. Full audit trail for compliance review.
Architecture
- 01Part 1 validates Rev5 objective questions and generates Component Examples. GPT-5, 5 parallel workers.
- 02Part 2 maps each KSI by Control by Part to the client's tech stack with GPT-5, then pulls structured CSV with GPT-4.1 Mini
- 03Part 3 hits the Vanta API for live tests, exports inventory, and maps each test to NIST controls with GPT-5
- 04Part 4 joins gap analysis, Vanta mapping, and inventory data into a master compliance scorecard with verdict logic
- 05Part 5 filters the scorecard for gaps, splits into existing tool and missing tool buckets, and generates custom Vanta test definitions
- 06Part 6 groups critical failures by root cause, generates CLI fixes, and produces an 8 to 10 theme Executive Remediation Roadmap
- 07Part 7 generates a Vanta UI configuration manifest that maps KSIs to test configurations
- 08Part 8 builds the Jira CSV hierarchy, rewrites descriptions for audit, scrubs to the AWS stack, splits into subtasks, and uploads through the REST API
Capabilities
- ·Rev5 800-53 objective question validation and Component Example generation
- ·KSI to client tech stack gap analysis with per control part granularity
- ·Live Vanta test inventory pull and NIST control mapping
- ·Automated compliance verdict logic (COMPLIANT, CRITICAL GAP, STRATEGIC GAP, PARTIAL)
- ·Custom Vanta test definition generation for uncovered controls
- ·Root cause grouping with Executive Remediation Roadmap generation
- ·Vanta UI configuration manifest for test to control mapping
- ·Jira ticket hierarchy generation (Epic, Task, Subtask) with audit ready formatting
- ·Family parameterization. Same pipeline runs for KSI, ADS, or CCM with one config change.
- ·Crash safe execution with immediate CSV writes and automatic skip on rerun
Stack
Technical Deep Dive
Architecture internals and annotated code from the production system.
Architecture Overview
The pipeline is deliberately sequential. Each of the 8 parts produces CSV artifacts the next part consumes. That makes every intermediate state inspectable, and it means a failure at Part 6 doesn't force a rerun of Parts 1 through 5. The dual model strategy (GPT-5 for reasoning, GPT-4.1 Mini for extraction) lives at the prompt level. Inside a single notebook, the same row can hit GPT-5 first for analysis and GPT-4.1 Mini right after to extract structured fields from the response.
Key Architectural Decisions
Why GPT-5 for Reasoning, GPT-4.1 Mini for Extraction
GPT-5 handles anything that requires understanding compliance context. Mapping a KSI control to a client's actual infrastructure. Deciding whether a Vanta test covers a NIST requirement. Grouping failures by root cause. Writing the remediation plan. GPT-4.1 Mini handles the mechanical follow-up. Pulling structured CSV fields out of GPT-5's prose response. Reformatting descriptions for audit style. Scrubbing vendor references. The expensive model only runs where reasoning quality matters. The cheap one does the transformation.
Why 8 Parts Instead of One Monolithic Pipeline
Each part writes named CSV files. Those files are checkpoints and audit artifacts at the same time. If Part 6 needs a prompt tweak, you rerun Part 6. Parts 1 through 5 are frozen on disk. A compliance reviewer can also inspect the scorecard (Part 4) before tickets get created (Part 8). The sequential design mirrors how a human analyst would actually work. Understand the controls, find the gaps, plan the remediation, file the tickets.
One Prompt Per Control Part, Not Combined
Part 2 sends one GPT-5 prompt per KSI by Control by Part combination. I don't batch multiple parts into a single prompt. That stops cross-contamination cold. The model's read on AC-2 Part (a) can't bleed into its read on AC-2 Part (b). Each response is independently cacheable and independently rerunnable.
Family Parameterization
Every notebook has a family-config cell. FAMILY = 'KSI' or 'ADS' or 'CCM.' Source data, output paths, and AI response directories all derive from that one variable. The same pipeline code handles every control family. Switching between them is a one-line config change plus a kernel restart.
Crash Safe Execution with Skip on Rerun
Every notebook writes results immediately, either as CSV appends or as individual output files. On rerun, completed rows are detected and skipped. So a notebook that died at row 47 of 200 resumes at row 48, not at row 1. Combine that with per-call prompt and response logging and every AI call is reproducible and auditable.
Verdict Logic as Deterministic Rules, Not AI
Part 4 merges the AI produced mappings with live Vanta test data and assigns a final verdict to every control part. That verdict logic is pure Python. No LLM call. The rules are explicit, reviewable, and deterministic. AI helps upstream. Once it's time to decide whether a control is covered, an LLM has no business making that call.
Every AI Call Logged with Prompt and Response
Every prompt sent and every response received gets written to disk with a timestamp and the row identifier. That's the audit trail. If a compliance reviewer questions a mapping, I can show them the exact prompt, the exact model output, and the deterministic logic that turned that output into a verdict. No black box.
Code Showcase 1
Dual-Model Strategy. GPT-5 Analysis → GPT-4.1 Mini Extraction
Two pass execution pattern. GPT-5 does the reasoning in pass one, GPT-4.1 Mini extracts structured fields in pass two. Input to pass two is the full prose output of pass one. That's why pass two is cheap. It isn't thinking. It's parsing.
Pass 1 — GPT-5 (Reasoning)
─────────────────────────────────────────────────────────
Input: KSI control + Rev5 objective + client tech stack
Prompt: "Analyze whether the client's infrastructure
satisfies this control requirement. Explain
the match type, coverage, and any gaps."
Output: Prose analysis (saved to ai_response/output/)
Cost: ~$0.03-0.05 per control part
Why GPT-5: Requires understanding compliance semantics,
client infrastructure context, and gap
identification — not a pattern-matching task.
Pass 2 — GPT-4.1 Mini (Extraction)
─────────────────────────────────────────────────────────
Input: GPT-5's prose response from Pass 1
Prompt: "Extract these fields from the analysis:
match_type, coverage_status, gap_description,
recommended_action. Return as CSV row."
Output: Structured CSV (gaps_structured.csv)
Cost: ~$0.001-0.003 per control part
Why 4.1 Mini: The answer is already in the text — this
is field extraction, not reasoning. 10-30x
cheaper and 3-5x faster than GPT-5.| Property | Detail |
|---|---|
| Cost Optimization | GPT-4.1 Mini extraction is 10-30x cheaper per call than GPT-5. Applied to every row where reasoning is already complete |
| Speed Optimization | GPT-4.1 Mini responds 3-5x faster, reducing total pipeline runtime on extraction-heavy stages |
| Quality Boundary | GPT-5 makes every judgment call; GPT-4.1 Mini only touches text where the answer already exists in the prose |
| Audit Trail | Both prompts and responses are saved to disk. GPT-5 analysis in output/, GPT-4.1 Mini extraction in table_insert/ |
Code Showcase 2
Verdict Logic. Deterministic Classification
Verdict assignment logic from final_merge.ipynb. No AI involved. Pure Python. ai_match is whether the AI said this control is covered. vanta_match is whether a live Vanta test covers it. Both true means covered. AI says yes and Vanta says no means an implementation gap. AI says no and Vanta says yes means an analysis gap. Both false is the clean miss.
# Verdict assignment logic (simplified from final_merge.ipynb)
# No AI involved — purely deterministic rules
def assign_verdict(row):
ai_match = row['match_type'] # from Part 2 (GPT-5)
vanta_status = row['vanta_status'] # from Part 3 (Vanta API)
has_vanta = row['has_vanta_test'] # from Part 3 merge
if ai_match == 'Direct' and vanta_status == 'PASSING':
return 'COMPLIANT'
if has_vanta and vanta_status == 'FAILING':
return 'CRITICAL GAP (Operational Failure)'
if ai_match == 'No Match' and not has_vanta:
return 'STRATEGIC GAP (Missing Tool/Policy)'
if ai_match == 'Direct' and not has_vanta:
return 'LIKELY COMPLIANT (No Vanta Test)'
return 'PARTIAL / VERIFICATION REQUIRED'| Property | Detail |
|---|---|
| No AI | Verdict logic is deterministic if/else. The most consequential classification in the pipeline is fully auditable code |
| Two Inputs | AI match type (from GPT-5 gap analysis) + Vanta test status (from live API). Combines AI judgment with ground truth |
| Five Verdicts | COMPLIANT, CRITICAL GAP, STRATEGIC GAP, LIKELY COMPLIANT, PARTIAL. Each maps to a different remediation path |
| Auditor Friendly | A compliance reviewer can trace any verdict to its two input signals without understanding the AI that produced them |
Code Showcase 3
Crash-Safe Batch Execution Pattern
Family config cell. One variable, FAMILY, drives every path and every output location in the notebook. Switching from KSI to ADS to CCM is a one-line change plus a kernel restart. The pipeline code is identical across all three families.
# Pattern used in every AI-calling notebook
# 1. Load existing results (skip completed work)
if os.path.exists(output_csv):
done = pd.read_csv(output_csv)
completed_ids = set(done['requirement_id'])
else:
done = pd.DataFrame()
completed_ids = set()
# 2. Filter to remaining work
remaining = df[~df['requirement_id'].isin(completed_ids)]
print(f"{len(completed_ids)} already done, "
f"{len(remaining)} remaining")
# 3. Process with immediate persistence
def process_row(row):
prompt = build_prompt(row)
# Save prompt to disk BEFORE calling API
save_prompt(prompt, row['requirement_id'])
response = openai_client.chat(model="gpt-5", ...)
# Save response to disk IMMEDIATELY after API returns
save_output(response, row['requirement_id'])
# Append to CSV immediately (not batched)
append_to_csv(output_csv, parse_response(response))
return response
# 4. Parallel execution with 5 workers
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {
executor.submit(process_row, row): row
for _, row in remaining.iterrows()
}| Property | Detail |
|---|---|
| Resume from Failure | Completed rows detected on re-run via CSV check. Interrupted at row 47 resumes from row 48 |
| Immediate Persistence | Each result written to CSV immediately after API response. No in-memory batching that could be lost |
| Full Audit Trail | Every prompt and response saved as individual files. Reproducible and auditable per row |
| 5 Parallel Workers | ThreadPoolExecutor with 5 workers on all GPT-5 calls. Balances throughput against API rate limits |
Code Showcase 4
Family Parameterization. One Pipeline, Three Control Families
Prompt and response logger. Every AI call writes to disk with the row ID, the timestamp, the exact prompt, and the exact response. If an auditor questions a decision months later, I can show them the trail.
# family-config cell (present in every notebook)
FAMILY = "KSI" # or "ADS" or "CCM"
# All paths derived from FAMILY:
SOURCE = f"FILES/REFERENCE/fedramp/20x/TRUE_SOURCE_FILES/{FAMILY}/{FAMILY}_Golden_Template.csv"
OUTPUT = f"code/testing/ai_response/{FAMILY}/"
LABELS = [f"family:{FAMILY.lower()}"] # family:ksi, family:ads, family:ccm
# Per-family output after full pipeline:
# ┌──────────┬──────────┬───────────────────────────┐
# │ Family │ Controls │ Final Tickets │
# ├──────────┼──────────┼───────────────────────────┤
# │ KSI │ 56 │ 127 (13 Epic + 114 Task) │
# │ ADS │ 20 │ 31 (3 Epic + 7 Task + │
# │ │ │ 21 Sub-task) │
# │ CCM │ 3 │ 83 (3 Epic + 20 Task + │
# │ │ │ 60 Sub-task) │
# └──────────┴──────────┴───────────────────────────┘| Property | Detail |
|---|---|
| Single Config Variable | FAMILY = 'KSI' at the top of every notebook. All paths, labels, and output dirs are derived from it |
| Zero Code Changes | Switching from KSI to ADS or CCM is a one-line edit + kernel restart. No pipeline code modifications |
| Scale Difference | KSI produces 127 tickets from 56 controls; CCM produces 83 from just 3 controls. The pipeline handles both scales |
| Label Tagging | family:ksi / family:ads / family:ccm labels auto-generated for Jira filtering and bulk operations |
Code Showcase 5
Jira Upload. Auto-Detecting Instance Configuration
Jira upload pass. Builds the Epic, Task, and Subtask structure from the deterministic merge output, and uses the Jira REST API to create everything. Error handling retries transient failures and records hard failures for manual review.
# Auto-detect subtask issue type name
# (Sunstone uses 'Subtask', SearchStax uses 'Sub-task')
issue_types = jira.get(f"/rest/api/3/issue/createmeta/...")
subtask_name = next(
t['name'] for t in issue_types
if t['name'].lower().replace('-', '') == 'subtask'
)
# Auto-detect project style
# (next-gen = team-managed, classic = company-managed)
project = jira.get(f"/rest/api/3/project/{PROJECT_KEY}")
project_style = project.get('style', 'classic')
# Auto-detect Epic Link custom field ID
# (varies per instance: customfield_10014, customfield_10600)
if project_style == 'classic':
fields = jira.get("/rest/api/3/field")
epic_link_field = next(
f['id'] for f in fields
if f['name'] == 'Epic Link'
)
# Upload phases with proper hierarchy:
# Phase 1: Create Epics (root cause themes)
# Phase 2: Create Tasks linked to parent Epics
# Phase 3: Create Sub-tasks linked to parent Tasks| Property | Detail |
|---|---|
| Subtask Name Detection | Handles 'Subtask' vs 'Sub-task' naming. A common Jira compatibility issue that causes silent upload failures |
| Project Style Detection | next-gen (team-managed) vs classic (company-managed) determines how Epic linking works |
| Epic Link Field Detection | Custom field ID for Epic Link varies per instance. Auto-detected from the field metadata API |
| Three-Phase Upload | Epics first, then Tasks with Epic linkage, then Sub-tasks with Task linkage. Order enforced for parent ID resolution |
Data Lifecycle
End-to-end flow of a single compliance check through the pipeline. Every arrow is a single NDJSON file. Every stage enforces a schema gate and count invariant before writing its output.
Data Preparation. Objective Validation + Component Examples
GPT-5 reads each Rev5 objective question and validates whether it meets the source schema. If it does, GPT-5 then generates a Component Example. Five parallel workers handle the batch. Output lands in a validated objectives CSV that downstream parts consume.
Gap Analysis. Control-to-Tech-Stack Mapping
For every KSI by Control by Part combination, GPT-5 reads the client's tech stack and maps which components satisfy the control. GPT-4.1 Mini runs right after to pull the structured fields out of GPT-5's prose response. Two passes. One for reasoning, one for extraction.
Vanta Integration. Live Test Inventory + NIST Mapping
The Vanta GraphQL client paginates the full test inventory. GPT-5 then maps each test to the NIST controls it actually covers. Output is the test coverage dataset the merge stage needs.
Merge + Scorecard. Deterministic Verdict Logic
Deterministic merge. The AI generated mappings join with the live Vanta coverage. Verdict rules run in pure Python. Every row gets a final verdict. Covered, gap, or needs human review. No model involved at the decision layer.
Custom Tests + Strategic Gaps
For confirmed gaps, GPT-5 writes the custom Vanta test definition and the strategic remediation plan. Output is a pair of artifacts per gap. A test spec Vanta can ingest, and a remediation plan engineering can execute.
Remediation Master Guide. Root Cause Grouping
GPT-5 groups failures by root cause and produces an executive roadmap. Same 30 gaps don't show up as 30 separate tickets when they're actually three root causes with ten symptoms each. The roadmap collapses them.
Vanta UI Mapping. Configuration Manifest
GPT-5 generates the Vanta UI configuration manifest. What gets created in the Vanta UI, what gets linked where, what labels and descriptions go on what tests. Output is the change spec someone can execute by hand or via the Vanta API.
Jira Ticket Pipeline. Build, Format, Upload
Part 8 runs in three subparts. GPT-5 formats every item for audit readability. The pipeline then builds the Epic, Task, and Subtask hierarchy. Jira REST API uploads it. 582 tickets in one run across all three control families.