AI / Compliance Automation / DevOpsApr 2026

AI Driven Issue Tracking and Analytics Pipeline

An eight stage AI pipeline. It maps FedRAMP 20x controls to a client's actual tech stack, finds the gaps against live Vanta test data, writes the remediation plan, and uploads the whole Epic, Task, and Subtask hierarchy to Jira.

Problem

FedRAMP 20x introduced a new control family structure (KSI, ADS, CCM) that existing compliance tooling doesn't handle. Analyzing each control against a client's real infrastructure, cross checking Vanta test coverage, finding gaps, writing remediation tickets, uploading them to Jira. Weeks of work per control family. Then you do it again for every new client.

Solution

Built an end to end pipeline in eight sequential notebook stages. Every stage emits CSV artifacts the next stage reads. That gives you a traceable chain from raw control data all the way to the Jira tickets. GPT-5 handles the heavy reasoning: gap analysis, remediation planning, root cause grouping. GPT-4.1 Mini handles structured extraction and formatting where deep thinking doesn't help. Cheap work stays cheap, smart work stays smart.

Impact

→Ran 3 control families (KSI with 56 controls, ADS with 20, CCM with 3) through the full pipeline end to end
→Generated 582 Jira tickets (Epics, Tasks, Subtasks) across every family with proper hierarchy and audit ready descriptions
→Compressed the control to ticket lifecycle from weeks of manual analysis down to a single pipeline run per family
→Every AI call is logged with the prompt sent and the response received. Full audit trail for compliance review.

Architecture

01Part 1 validates Rev5 objective questions and generates Component Examples. GPT-5, 5 parallel workers.
02Part 2 maps each KSI by Control by Part to the client's tech stack with GPT-5, then pulls structured CSV with GPT-4.1 Mini
03Part 3 hits the Vanta API for live tests, exports inventory, and maps each test to NIST controls with GPT-5
04Part 4 joins gap analysis, Vanta mapping, and inventory data into a master compliance scorecard with verdict logic
05Part 5 filters the scorecard for gaps, splits into existing tool and missing tool buckets, and generates custom Vanta test definitions
06Part 6 groups critical failures by root cause, generates CLI fixes, and produces an 8 to 10 theme Executive Remediation Roadmap
07Part 7 generates a Vanta UI configuration manifest that maps KSIs to test configurations
08Part 8 builds the Jira CSV hierarchy, rewrites descriptions for audit, scrubs to the AWS stack, splits into subtasks, and uploads through the REST API

Capabilities

·Rev5 800-53 objective question validation and Component Example generation
·KSI to client tech stack gap analysis with per control part granularity
·Live Vanta test inventory pull and NIST control mapping
·Automated compliance verdict logic (COMPLIANT, CRITICAL GAP, STRATEGIC GAP, PARTIAL)
·Custom Vanta test definition generation for uncovered controls
·Root cause grouping with Executive Remediation Roadmap generation
·Vanta UI configuration manifest for test to control mapping
·Jira ticket hierarchy generation (Epic, Task, Subtask) with audit ready formatting
·Family parameterization. Same pipeline runs for KSI, ADS, or CCM with one config change.
·Crash safe execution with immediate CSV writes and automatic skip on rerun

Stack

PythonOpenAI GPT-5OpenAI GPT-4.1 MiniVanta API (GraphQL)Jira REST API v3AWS Secrets ManagerpandasThreadPoolExecutorJupyter Notebooks

Technical Deep Dive

Architecture internals and annotated code from the production system.

Architecture Overview

The pipeline is deliberately sequential. Each of the 8 parts produces CSV artifacts the next part consumes. That makes every intermediate state inspectable, and it means a failure at Part 6 doesn't force a rerun of Parts 1 through 5. The dual model strategy (GPT-5 for reasoning, GPT-4.1 Mini for extraction) lives at the prompt level. Inside a single notebook, the same row can hit GPT-5 first for analysis and GPT-4.1 Mini right after to extract structured fields from the response.

Rev5 Objective Questions + KSI Golden Template (source data)

→[Part 1] GPT-5 validates objectives and generates Component Examples

→[Part 2] GPT-5 maps controls to the client tech stack, then GPT-4.1 Mini extracts structured CSV

→[Part 3] Vanta API pulls live tests, GPT-5 maps those tests to NIST controls

→[Part 4] Deterministic merge and verdict logic produces the Master Compliance Scorecard

→[Part 5] GPT-5 generates custom Vanta test definitions and a strategic gap remediation plan

→[Part 6] GPT-5 groups failures by root cause, output is the Executive Remediation Roadmap

→[Part 7] GPT-5 generates the Vanta UI configuration manifest

→[Part 8a-c] GPT-5 formats for audit, then the Jira REST API uploads the Epic, Task, and Subtask hierarchy

Key Architectural Decisions

Why GPT-5 for Reasoning, GPT-4.1 Mini for Extraction

GPT-5 handles anything that requires understanding compliance context. Mapping a KSI control to a client's actual infrastructure. Deciding whether a Vanta test covers a NIST requirement. Grouping failures by root cause. Writing the remediation plan. GPT-4.1 Mini handles the mechanical follow-up. Pulling structured CSV fields out of GPT-5's prose response. Reformatting descriptions for audit style. Scrubbing vendor references. The expensive model only runs where reasoning quality matters. The cheap one does the transformation.

Why 8 Parts Instead of One Monolithic Pipeline

Each part writes named CSV files. Those files are checkpoints and audit artifacts at the same time. If Part 6 needs a prompt tweak, you rerun Part 6. Parts 1 through 5 are frozen on disk. A compliance reviewer can also inspect the scorecard (Part 4) before tickets get created (Part 8). The sequential design mirrors how a human analyst would actually work. Understand the controls, find the gaps, plan the remediation, file the tickets.

One Prompt Per Control Part, Not Combined

Part 2 sends one GPT-5 prompt per KSI by Control by Part combination. I don't batch multiple parts into a single prompt. That stops cross-contamination cold. The model's read on AC-2 Part (a) can't bleed into its read on AC-2 Part (b). Each response is independently cacheable and independently rerunnable.

Family Parameterization

Every notebook has a family-config cell. FAMILY = 'KSI' or 'ADS' or 'CCM.' Source data, output paths, and AI response directories all derive from that one variable. The same pipeline code handles every control family. Switching between them is a one-line config change plus a kernel restart.

Crash Safe Execution with Skip on Rerun

Every notebook writes results immediately, either as CSV appends or as individual output files. On rerun, completed rows are detected and skipped. So a notebook that died at row 47 of 200 resumes at row 48, not at row 1. Combine that with per-call prompt and response logging and every AI call is reproducible and auditable.

Verdict Logic as Deterministic Rules, Not AI

Part 4 merges the AI produced mappings with live Vanta test data and assigns a final verdict to every control part. That verdict logic is pure Python. No LLM call. The rules are explicit, reviewable, and deterministic. AI helps upstream. Once it's time to decide whether a control is covered, an LLM has no business making that call.

Every AI Call Logged with Prompt and Response

Every prompt sent and every response received gets written to disk with a timestamp and the row identifier. That's the audit trail. If a compliance reviewer questions a mapping, I can show them the exact prompt, the exact model output, and the deterministic logic that turned that output into a verdict. No black box.

Code Showcase 1

Dual-Model Strategy. GPT-5 Analysis → GPT-4.1 Mini Extraction

Two pass execution pattern. GPT-5 does the reasoning in pass one, GPT-4.1 Mini extracts structured fields in pass two. Input to pass two is the full prose output of pass one. That's why pass two is cheap. It isn't thinking. It's parsing.

text

Pass 1 — GPT-5 (Reasoning)
─────────────────────────────────────────────────────────
Input:   KSI control + Rev5 objective + client tech stack
Prompt:  "Analyze whether the client's infrastructure
          satisfies this control requirement. Explain
          the match type, coverage, and any gaps."
Output:  Prose analysis (saved to ai_response/output/)
Cost:    ~$0.03-0.05 per control part
Why GPT-5: Requires understanding compliance semantics,
           client infrastructure context, and gap
           identification — not a pattern-matching task.


Pass 2 — GPT-4.1 Mini (Extraction)
─────────────────────────────────────────────────────────
Input:   GPT-5's prose response from Pass 1
Prompt:  "Extract these fields from the analysis:
          match_type, coverage_status, gap_description,
          recommended_action. Return as CSV row."
Output:  Structured CSV (gaps_structured.csv)
Cost:    ~$0.001-0.003 per control part
Why 4.1 Mini: The answer is already in the text — this
              is field extraction, not reasoning. 10-30x
              cheaper and 3-5x faster than GPT-5.

Property	Detail
Cost Optimization	GPT-4.1 Mini extraction is 10-30x cheaper per call than GPT-5. Applied to every row where reasoning is already complete
Speed Optimization	GPT-4.1 Mini responds 3-5x faster, reducing total pipeline runtime on extraction-heavy stages
Quality Boundary	GPT-5 makes every judgment call; GPT-4.1 Mini only touches text where the answer already exists in the prose
Audit Trail	Both prompts and responses are saved to disk. GPT-5 analysis in output/, GPT-4.1 Mini extraction in table_insert/

Code Showcase 2

Verdict Logic. Deterministic Classification

Verdict assignment logic from final_merge.ipynb. No AI involved. Pure Python. ai_match is whether the AI said this control is covered. vanta_match is whether a live Vanta test covers it. Both true means covered. AI says yes and Vanta says no means an implementation gap. AI says no and Vanta says yes means an analysis gap. Both false is the clean miss.

python

# Verdict assignment logic (simplified from final_merge.ipynb)
# No AI involved — purely deterministic rules

def assign_verdict(row):
    ai_match = row['match_type']        # from Part 2 (GPT-5)
    vanta_status = row['vanta_status']   # from Part 3 (Vanta API)
    has_vanta = row['has_vanta_test']    # from Part 3 merge

    if ai_match == 'Direct' and vanta_status == 'PASSING':
        return 'COMPLIANT'

    if has_vanta and vanta_status == 'FAILING':
        return 'CRITICAL GAP (Operational Failure)'

    if ai_match == 'No Match' and not has_vanta:
        return 'STRATEGIC GAP (Missing Tool/Policy)'

    if ai_match == 'Direct' and not has_vanta:
        return 'LIKELY COMPLIANT (No Vanta Test)'

    return 'PARTIAL / VERIFICATION REQUIRED'

Property	Detail
No AI	Verdict logic is deterministic if/else. The most consequential classification in the pipeline is fully auditable code
Two Inputs	AI match type (from GPT-5 gap analysis) + Vanta test status (from live API). Combines AI judgment with ground truth
Five Verdicts	COMPLIANT, CRITICAL GAP, STRATEGIC GAP, LIKELY COMPLIANT, PARTIAL. Each maps to a different remediation path
Auditor Friendly	A compliance reviewer can trace any verdict to its two input signals without understanding the AI that produced them

Code Showcase 3

Crash-Safe Batch Execution Pattern

Family config cell. One variable, FAMILY, drives every path and every output location in the notebook. Switching from KSI to ADS to CCM is a one-line change plus a kernel restart. The pipeline code is identical across all three families.

python

# Pattern used in every AI-calling notebook

# 1. Load existing results (skip completed work)
if os.path.exists(output_csv):
    done = pd.read_csv(output_csv)
    completed_ids = set(done['requirement_id'])
else:
    done = pd.DataFrame()
    completed_ids = set()

# 2. Filter to remaining work
remaining = df[~df['requirement_id'].isin(completed_ids)]
print(f"{len(completed_ids)} already done, "
      f"{len(remaining)} remaining")

# 3. Process with immediate persistence
def process_row(row):
    prompt = build_prompt(row)

    # Save prompt to disk BEFORE calling API
    save_prompt(prompt, row['requirement_id'])

    response = openai_client.chat(model="gpt-5", ...)

    # Save response to disk IMMEDIATELY after API returns
    save_output(response, row['requirement_id'])

    # Append to CSV immediately (not batched)
    append_to_csv(output_csv, parse_response(response))

    return response

# 4. Parallel execution with 5 workers
with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {
        executor.submit(process_row, row): row
        for _, row in remaining.iterrows()
    }

Property	Detail
Resume from Failure	Completed rows detected on re-run via CSV check. Interrupted at row 47 resumes from row 48
Immediate Persistence	Each result written to CSV immediately after API response. No in-memory batching that could be lost
Full Audit Trail	Every prompt and response saved as individual files. Reproducible and auditable per row
5 Parallel Workers	ThreadPoolExecutor with 5 workers on all GPT-5 calls. Balances throughput against API rate limits

Code Showcase 4

Family Parameterization. One Pipeline, Three Control Families

Prompt and response logger. Every AI call writes to disk with the row ID, the timestamp, the exact prompt, and the exact response. If an auditor questions a decision months later, I can show them the trail.

python

# family-config cell (present in every notebook)
FAMILY = "KSI"  # or "ADS" or "CCM"

# All paths derived from FAMILY:
SOURCE = f"FILES/REFERENCE/fedramp/20x/TRUE_SOURCE_FILES/{FAMILY}/{FAMILY}_Golden_Template.csv"
OUTPUT = f"code/testing/ai_response/{FAMILY}/"
LABELS = [f"family:{FAMILY.lower()}"]  # family:ksi, family:ads, family:ccm

# Per-family output after full pipeline:
# ┌──────────┬──────────┬───────────────────────────┐
# │ Family   │ Controls │ Final Tickets             │
# ├──────────┼──────────┼───────────────────────────┤
# │ KSI      │ 56       │ 127 (13 Epic + 114 Task)  │
# │ ADS      │ 20       │ 31  (3 Epic + 7 Task +    │
# │          │          │      21 Sub-task)          │
# │ CCM      │ 3        │ 83  (3 Epic + 20 Task +   │
# │          │          │      60 Sub-task)          │
# └──────────┴──────────┴───────────────────────────┘

Property	Detail
Single Config Variable	FAMILY = 'KSI' at the top of every notebook. All paths, labels, and output dirs are derived from it
Zero Code Changes	Switching from KSI to ADS or CCM is a one-line edit + kernel restart. No pipeline code modifications
Scale Difference	KSI produces 127 tickets from 56 controls; CCM produces 83 from just 3 controls. The pipeline handles both scales
Label Tagging	family:ksi / family:ads / family:ccm labels auto-generated for Jira filtering and bulk operations

Code Showcase 5

Jira Upload. Auto-Detecting Instance Configuration

Jira upload pass. Builds the Epic, Task, and Subtask structure from the deterministic merge output, and uses the Jira REST API to create everything. Error handling retries transient failures and records hard failures for manual review.

python

# Auto-detect subtask issue type name
# (Sunstone uses 'Subtask', SearchStax uses 'Sub-task')
issue_types = jira.get(f"/rest/api/3/issue/createmeta/...")
subtask_name = next(
    t['name'] for t in issue_types
    if t['name'].lower().replace('-', '') == 'subtask'
)

# Auto-detect project style
# (next-gen = team-managed, classic = company-managed)
project = jira.get(f"/rest/api/3/project/{PROJECT_KEY}")
project_style = project.get('style', 'classic')

# Auto-detect Epic Link custom field ID
# (varies per instance: customfield_10014, customfield_10600)
if project_style == 'classic':
    fields = jira.get("/rest/api/3/field")
    epic_link_field = next(
        f['id'] for f in fields
        if f['name'] == 'Epic Link'
    )

# Upload phases with proper hierarchy:
# Phase 1: Create Epics (root cause themes)
# Phase 2: Create Tasks linked to parent Epics
# Phase 3: Create Sub-tasks linked to parent Tasks

Property	Detail
Subtask Name Detection	Handles 'Subtask' vs 'Sub-task' naming. A common Jira compatibility issue that causes silent upload failures
Project Style Detection	next-gen (team-managed) vs classic (company-managed) determines how Epic linking works
Epic Link Field Detection	Custom field ID for Epic Link varies per instance. Auto-detected from the field metadata API
Three-Phase Upload	Epics first, then Tasks with Epic linkage, then Sub-tasks with Task linkage. Order enforced for parent ID resolution

Data Lifecycle

End-to-end flow of a single compliance check through the pipeline. Every arrow is a single NDJSON file. Every stage enforces a schema gate and count invariant before writing its output.

L1Data Preparation. Objective Validation + Component Examples

|rev5_objective_questions.csv (updated in place), ai_response/sandbox_rev5/prompts/ + output/▼

L2Gap Analysis. Control-to-Tech-Stack Mapping

|gaps_results.csv, gaps_structured.csv, prompts/, output/, table_insert/▼

L3Vanta Integration. Live Test Inventory + NIST Mapping

|vanta_tests_inventory.csv, vanta_to_nist_mapping.csv, map_controls/prompts/ + output/▼

L4Merge + Scorecard. Deterministic Verdict Logic

|final_compliance_scorecard.csv▼

L5Custom Tests + Strategic Gaps

|PROMPT_1_CUSTOM_TESTS.csv, PROMPT_2_STRATEGIC_GAPS.csv, payload_custom_tests_LOSSLESS.csv▼

L6Remediation Master Guide. Root Cause Grouping

|payload_remediation.csv, remediation_master_guide.csv, executive_remediation_roadmap.json▼

L7Vanta UI Mapping. Configuration Manifest

|payload_ui_mapping.csv, vanta_ui_manifest.csv▼

L8Jira Ticket Pipeline. Build, Format, Upload

|run_manifest.json▼

Stage 1

Data Preparation. Objective Validation + Component Examples

GPT-5 reads each Rev5 objective question and validates whether it meets the source schema. If it does, GPT-5 then generates a Component Example. Five parallel workers handle the batch. Output lands in a validated objectives CSV that downstream parts consume.

Inputrev5_objective_questions.csv, KSI_Golden_Template.csv

ProcessingGPT-5 validates each objective question, generates Component Examples showing what compliance looks like in practice. ThreadPoolExecutor with 5 workers for parallel processing.

Outputrev5_objective_questions.csv (updated in place), ai_response/sandbox_rev5/prompts/ + output/

ModelGPT-5 (5 parallel workers)

Key Filepart_1_prompt_sandbox/prompt_sandbox.ipynb

Stage 2

Gap Analysis. Control-to-Tech-Stack Mapping

For every KSI by Control by Part combination, GPT-5 reads the client's tech stack and maps which components satisfy the control. GPT-4.1 Mini runs right after to pull the structured fields out of GPT-5's prose response. Two passes. One for reasoning, one for extraction.

InputKSI_Golden_Template.csv, rev5_objective_questions.csv, client tech stack context

ProcessingPass 1: GPT-5 maps each control part to client infrastructure (one prompt per part, not combined). Pass 2: GPT-4.1 Mini extracts structured CSV fields from GPT-5's prose responses.

Outputgaps_results.csv, gaps_structured.csv, prompts/, output/, table_insert/

ModelGPT-5 (analysis) → GPT-4.1 Mini (extraction)

Key Filepart_2_fedramp_gaps_analysis/fedramp_20x_gaps_analysis.ipynb

Stage 3

Vanta Integration. Live Test Inventory + NIST Mapping

The Vanta GraphQL client paginates the full test inventory. GPT-5 then maps each test to the NIST controls it actually covers. Output is the test coverage dataset the merge stage needs.

InputVanta API (SearchStax instance), NIST control metadata

ProcessingVanta API pull (full test inventory) → CSV export → GPT-5 maps each test to NIST controls

Outputvanta_tests_inventory.csv, vanta_to_nist_mapping.csv, map_controls/prompts/ + output/

ModelGPT-5

Key Filepart_3_ksi_vanta_merge/ksi_vanta_merge.ipynb

Stage 4

Merge + Scorecard. Deterministic Verdict Logic

Deterministic merge. The AI generated mappings join with the live Vanta coverage. Verdict rules run in pure Python. Every row gets a final verdict. Covered, gap, or needs human review. No model involved at the decision layer.

Inputgaps_structured.csv, vanta_to_nist_mapping.csv, vanta_tests_inventory.csv

Processingpandas merge on control identifiers → deterministic verdict assignment based on AI match type + Vanta test status

Outputfinal_compliance_scorecard.csv

ModelNone. Deterministic logic only

Key Filepart_4_final_merge/final_merge.ipynb

Stage 5

Custom Tests + Strategic Gaps

For confirmed gaps, GPT-5 writes the custom Vanta test definition and the strategic remediation plan. Output is a pair of artifacts per gap. A test spec Vanta can ingest, and a remediation plan engineering can execute.

Inputfinal_compliance_scorecard.csv

ProcessingFilter for gaps → split by tool availability → GPT-5 generates custom Vanta test definitions (PROMPT_1) and strategic gap remediation plans (PROMPT_2)

OutputPROMPT_1_CUSTOM_TESTS.csv, PROMPT_2_STRATEGIC_GAPS.csv, payload_custom_tests_LOSSLESS.csv

ModelGPT-5

Key Filepart_5_custom_tests/custom_tests.ipynb

Stage 6

Remediation Master Guide. Root Cause Grouping

GPT-5 groups failures by root cause and produces an executive roadmap. Same 30 gaps don't show up as 30 separate tickets when they're actually three root causes with ten symptoms each. The roadmap collapses them.

Inputfinal_compliance_scorecard.csv (critical gaps), vanta_tests_inventory.csv (failure details)

ProcessingBuild payload_remediation.csv → batch ~30 failures to GPT-5 for root cause grouping + CLI fixes → final master analysis for Executive Roadmap

Outputpayload_remediation.csv, remediation_master_guide.csv, executive_remediation_roadmap.json

ModelGPT-5

Key Filepart_6_payload_remediation/payload_remediation.ipynb

Stage 7

Vanta UI Mapping. Configuration Manifest

GPT-5 generates the Vanta UI configuration manifest. What gets created in the Vanta UI, what gets linked where, what labels and descriptions go on what tests. Output is the change spec someone can execute by hand or via the Vanta API.

Inputfinal_compliance_scorecard.csv (rows with Vanta coverage)

ProcessingBuild payload_ui_mapping.csv → batch ~20 KSIs to GPT-5 for Vanta UI configuration generation

Outputpayload_ui_mapping.csv, vanta_ui_manifest.csv

ModelGPT-5

Key Filepart_7_vanta_ui_mapping/vanta_tests_control_ui_mapping.ipynb

Stage 8

Jira Ticket Pipeline. Build, Format, Upload

Part 8 runs in three subparts. GPT-5 formats every item for audit readability. The pipeline then builds the Epic, Task, and Subtask hierarchy. Jira REST API uploads it. 582 tickets in one run across all three control families.

Inputremediation_master_guide.csv, executive_remediation_roadmap.json

Processing8a: Build v2 CSV (Epics + Tasks with Issue ID / Parent ID hierarchy) → 8b: GPT-5 rewrites to audit format (v3), scrubs to AWS stack (v4), splits into subtasks (v5), generates labels → 8c: Upload to Jira REST API v3 (Phase 1: Epics → Phase 2: Tasks → Phase 3: Sub-tasks)

Outputjira_remediation_import_v2.csv through v5_subtasks.csv, jira_upload_log.csv

ModelGPT-5 (audit formatting + subtask splitting)

Key Filepart_8_jira_pipeline/