AI-Driven Issue Tracking & Analytics Pipeline
An 8-part AI pipeline that maps FedRAMP 20x controls to a client's technology stack, identifies compliance gaps against live Vanta test data, generates remediation plans, and uploads a fully structured Epic → Task → Sub-task hierarchy to Jira.
Problem
FedRAMP 20x introduced a new control family structure (KSI, ADS, CCM) that doesn't map neatly to existing compliance tooling. Manually analyzing each control against a client's actual infrastructure, cross-referencing Vanta test coverage, identifying gaps, writing remediation tickets, and uploading them to Jira is a multi-week effort per control family — and it has to be repeated for every client engagement.
Solution
Built an end-to-end pipeline split into 8 sequential notebook stages. Each stage produces CSV artifacts that the next stage consumes, creating a traceable chain from raw control data to uploaded Jira tickets. GPT-5 handles all reasoning-intensive work (gap analysis, remediation planning, root cause grouping), while GPT-4.1 Mini handles structured extraction and formatting passes — optimizing for cost and speed where deep reasoning isn't needed.
Impact
- →Processed 3 control families (KSI: 56 controls, ADS: 20 controls, CCM: 3 controls) through the full pipeline end-to-end
- →Generated 582 Jira tickets (Epics + Tasks + Sub-tasks) across all families with proper hierarchy and audit-ready descriptions
- →Reduced the control-to-ticket lifecycle from weeks of manual analysis to a single pipeline run per family
- →Every AI call logged with prompt sent and response received — full audit trail for compliance review
Architecture
- 01Part 1 validates Rev5 objective questions and generates Component Examples using GPT-5 with 5 parallel workers
- 02Part 2 maps each KSI × Control × Part to the client's tech stack via GPT-5, then extracts structured CSV via GPT-4.1 Mini
- 03Part 3 pulls live Vanta tests via API, exports inventory, and maps each test to NIST controls using GPT-5
- 04Part 4 joins gap analysis + Vanta mapping + inventory data into a master compliance scorecard with verdict logic
- 05Part 5 filters the scorecard for gaps, splits into existing-tool and missing-tool categories, generates custom Vanta test definitions
- 06Part 6 groups critical gap failures by root cause, generates CLI fixes, and produces an Executive Remediation Roadmap with 8-10 themes
- 07Part 7 generates a Vanta UI configuration manifest mapping KSIs to test configurations
- 08Part 8 builds the Jira CSV hierarchy, rewrites descriptions to audit format, scrubs to AWS stack, splits into subtasks, and uploads via REST API
Capabilities
- ·Rev5 800-53 objective question validation and Component Example generation
- ·KSI-to-client-tech-stack gap analysis with per-control-part granularity
- ·Live Vanta test inventory pull and NIST control mapping
- ·Automated compliance verdict logic (COMPLIANT / CRITICAL GAP / STRATEGIC GAP / PARTIAL)
- ·Custom Vanta test definition generation for uncovered controls
- ·Root cause grouping with Executive Remediation Roadmap generation
- ·Vanta UI configuration manifest for test-to-control mapping
- ·Jira ticket hierarchy generation (Epic → Task → Sub-task) with audit-ready formatting
- ·Family parameterization — same pipeline runs for KSI, ADS, or CCM with a single config change
- ·Crash-safe execution with immediate CSV writes and automatic skip-on-rerun
Stack
Technical Deep Dive
Architecture internals and annotated code from the production system.
Architecture Overview
The pipeline is deliberately sequential — each of the 8 parts produces CSV artifacts that the next part consumes. This makes every intermediate state inspectable and debuggable, and means a failure at Part 6 doesn't require re-running Parts 1-5. The dual-model strategy (GPT-5 for reasoning, GPT-4.1 Mini for extraction) is applied at the prompt level: within a single notebook, the same row may be processed by GPT-5 first for analysis, then by GPT-4.1 Mini to extract structured fields from the response.
Key Architectural Decisions
Why GPT-5 for Reasoning, GPT-4.1 Mini for Extraction
GPT-5 handles every task that requires understanding compliance context: mapping a KSI control to a client's specific infrastructure, deciding whether a Vanta test covers a NIST requirement, grouping failures by root cause, and writing remediation plans. GPT-4.1 Mini handles the mechanical follow-up: extracting structured CSV fields from GPT-5's prose response, reformatting ticket descriptions to audit style, and scrubbing vendor-specific references. This split means the expensive model only runs where reasoning quality directly affects output correctness, while the cheaper, faster model handles transformation tasks where the answer is already in the text.
Why 8 Separate Parts Instead of One Monolithic Pipeline
Each part produces named CSV files that serve as both checkpoints and audit artifacts. If Part 6 (remediation grouping) needs a prompt tweak, you re-run Part 6 only — Parts 1-5 outputs are stable on disk. This also means a compliance reviewer can inspect the intermediate scorecard (Part 4) before the pipeline generates tickets (Part 8). The sequential design mirrors how a human analyst would work: first understand the controls, then find gaps, then plan remediation, then create tickets.
One Mini Prompt Per Control Part, Not Combined
Part 2 sends one GPT-5 prompt per KSI × Control × Part combination rather than batching multiple parts into a single prompt. This prevents cross-contamination — the model's analysis of AC-2 Part (a) doesn't bleed into its analysis of AC-2 Part (b). It also means each response is independently cacheable and re-runnable.
Family Parameterization
Every notebook has a family-config cell with FAMILY = 'KSI' (or 'ADS' or 'CCM'). Source data, output paths, and AI response directories are all derived from this single variable. The same pipeline code processes all three control families — switching families is a one-line config change plus a kernel restart.
Crash-Safe Execution with Skip-on-Rerun
Every notebook writes results immediately (CSV append or individual output files). On re-run, completed rows are detected and skipped automatically. This means a notebook interrupted at row 47 of 200 resumes from row 48, not row 1. Combined with prompt/response logging, every AI call is reproducible and auditable.
Verdict Logic as Deterministic Rules, Not AI
Part 4's verdict assignment (COMPLIANT, CRITICAL GAP, STRATEGIC GAP, LIKELY COMPLIANT, PARTIAL) is pure deterministic logic applied to the merged scorecard — not an AI judgment call. If the AI says 'Direct match' and the Vanta test is passing, the verdict is COMPLIANT. If the Vanta test is actively failing, it's CRITICAL GAP. This keeps the most consequential classification in auditable, rule-based code.
Auto-Detecting Jira Instance Configuration
Part 8c auto-detects the subtask issue type name (Subtask vs. Sub-task), project style (next-gen vs. classic), and Epic Link custom field ID — all of which vary between Jira instances. This means the same upload code works against both the Sunstone test environment and SearchStax production without manual configuration.
Code Showcase 1
Dual-Model Strategy — GPT-5 Analysis → GPT-4.1 Mini Extraction
Part 2 demonstrates the core dual-model pattern. GPT-5 receives the full control context and produces a detailed prose analysis of whether the client's infrastructure covers the requirement. GPT-4.1 Mini then receives GPT-5's response and extracts structured fields into CSV columns. This pattern repeats across the pipeline wherever an analysis step produces prose that needs to become structured data.
Pass 1 — GPT-5 (Reasoning)
─────────────────────────────────────────────────────────
Input: KSI control + Rev5 objective + client tech stack
Prompt: "Analyze whether the client's infrastructure
satisfies this control requirement. Explain
the match type, coverage, and any gaps."
Output: Prose analysis (saved to ai_response/output/)
Cost: ~$0.03-0.05 per control part
Why GPT-5: Requires understanding compliance semantics,
client infrastructure context, and gap
identification — not a pattern-matching task.
Pass 2 — GPT-4.1 Mini (Extraction)
─────────────────────────────────────────────────────────
Input: GPT-5's prose response from Pass 1
Prompt: "Extract these fields from the analysis:
match_type, coverage_status, gap_description,
recommended_action. Return as CSV row."
Output: Structured CSV (gaps_structured.csv)
Cost: ~$0.001-0.003 per control part
Why 4.1 Mini: The answer is already in the text — this
is field extraction, not reasoning. 10-30x
cheaper and 3-5x faster than GPT-5.| Property | Detail |
|---|---|
| Cost Optimization | GPT-4.1 Mini extraction is 10-30x cheaper per call than GPT-5 — applied to every row where reasoning is already complete |
| Speed Optimization | GPT-4.1 Mini responds 3-5x faster, reducing total pipeline runtime on extraction-heavy stages |
| Quality Boundary | GPT-5 makes every judgment call; GPT-4.1 Mini only touches text where the answer already exists in the prose |
| Audit Trail | Both prompts and responses are saved to disk — GPT-5 analysis in output/, GPT-4.1 Mini extraction in table_insert/ |
Code Showcase 2
Verdict Logic — Deterministic Classification
Part 4's verdict assignment is pure rule-based logic, not AI inference. The scorecard merge produces a row for every KSI × Control × Part with an AI match type (from Part 2) and a Vanta test status (from Part 3). The verdict is assigned by deterministic if/else logic that an auditor can trace without understanding AI.
# Verdict assignment logic (simplified from final_merge.ipynb)
# No AI involved — purely deterministic rules
def assign_verdict(row):
ai_match = row['match_type'] # from Part 2 (GPT-5)
vanta_status = row['vanta_status'] # from Part 3 (Vanta API)
has_vanta = row['has_vanta_test'] # from Part 3 merge
if ai_match == 'Direct' and vanta_status == 'PASSING':
return 'COMPLIANT'
if has_vanta and vanta_status == 'FAILING':
return 'CRITICAL GAP (Operational Failure)'
if ai_match == 'No Match' and not has_vanta:
return 'STRATEGIC GAP (Missing Tool/Policy)'
if ai_match == 'Direct' and not has_vanta:
return 'LIKELY COMPLIANT (No Vanta Test)'
return 'PARTIAL / VERIFICATION REQUIRED'| Property | Detail |
|---|---|
| No AI | Verdict logic is deterministic if/else — the most consequential classification in the pipeline is fully auditable code |
| Two Inputs | AI match type (from GPT-5 gap analysis) + Vanta test status (from live API) — combines AI judgment with ground truth |
| Five Verdicts | COMPLIANT, CRITICAL GAP, STRATEGIC GAP, LIKELY COMPLIANT, PARTIAL — each maps to a different remediation path |
| Auditor Friendly | A compliance reviewer can trace any verdict to its two input signals without understanding the AI that produced them |
Code Showcase 3
Crash-Safe Batch Execution Pattern
Every AI-calling notebook follows the same crash-safe pattern: load existing results, skip completed rows, process remaining rows with immediate CSV persistence, and log every prompt/response pair. This pattern is consistent across all 8 parts.
# Pattern used in every AI-calling notebook
# 1. Load existing results (skip completed work)
if os.path.exists(output_csv):
done = pd.read_csv(output_csv)
completed_ids = set(done['requirement_id'])
else:
done = pd.DataFrame()
completed_ids = set()
# 2. Filter to remaining work
remaining = df[~df['requirement_id'].isin(completed_ids)]
print(f"{len(completed_ids)} already done, "
f"{len(remaining)} remaining")
# 3. Process with immediate persistence
def process_row(row):
prompt = build_prompt(row)
# Save prompt to disk BEFORE calling API
save_prompt(prompt, row['requirement_id'])
response = openai_client.chat(model="gpt-5", ...)
# Save response to disk IMMEDIATELY after API returns
save_output(response, row['requirement_id'])
# Append to CSV immediately (not batched)
append_to_csv(output_csv, parse_response(response))
return response
# 4. Parallel execution with 5 workers
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {
executor.submit(process_row, row): row
for _, row in remaining.iterrows()
}| Property | Detail |
|---|---|
| Resume from Failure | Completed rows detected on re-run via CSV check — interrupted at row 47 resumes from row 48 |
| Immediate Persistence | Each result written to CSV immediately after API response — no in-memory batching that could be lost |
| Full Audit Trail | Every prompt and response saved as individual files — reproducible and auditable per row |
| 5 Parallel Workers | ThreadPoolExecutor with 5 workers on all GPT-5 calls — balances throughput against API rate limits |
Code Showcase 4
Family Parameterization — One Pipeline, Three Control Families
A single FAMILY variable at the top of every notebook controls source data paths, output directories, and ticket labels. The same pipeline code processes KSI (56 controls → 127 tickets), ADS (20 controls → 31 tickets), and CCM (3 controls → 83 tickets) without code changes.
# family-config cell (present in every notebook)
FAMILY = "KSI" # or "ADS" or "CCM"
# All paths derived from FAMILY:
SOURCE = f"FILES/REFERENCE/fedramp/20x/TRUE_SOURCE_FILES/{FAMILY}/{FAMILY}_Golden_Template.csv"
OUTPUT = f"code/testing/ai_response/{FAMILY}/"
LABELS = [f"family:{FAMILY.lower()}"] # family:ksi, family:ads, family:ccm
# Per-family output after full pipeline:
# ┌──────────┬──────────┬───────────────────────────┐
# │ Family │ Controls │ Final Tickets │
# ├──────────┼──────────┼───────────────────────────┤
# │ KSI │ 56 │ 127 (13 Epic + 114 Task) │
# │ ADS │ 20 │ 31 (3 Epic + 7 Task + │
# │ │ │ 21 Sub-task) │
# │ CCM │ 3 │ 83 (3 Epic + 20 Task + │
# │ │ │ 60 Sub-task) │
# └──────────┴──────────┴───────────────────────────┘| Property | Detail |
|---|---|
| Single Config Variable | FAMILY = 'KSI' at the top of every notebook — all paths, labels, and output dirs are derived from it |
| Zero Code Changes | Switching from KSI to ADS or CCM is a one-line edit + kernel restart — no pipeline code modifications |
| Scale Difference | KSI produces 127 tickets from 56 controls; CCM produces 83 from just 3 controls — the pipeline handles both scales |
| Label Tagging | family:ksi / family:ads / family:ccm labels auto-generated for Jira filtering and bulk operations |
Code Showcase 5
Jira Upload — Auto-Detecting Instance Configuration
Part 8c's upload notebook auto-detects three configuration values that vary between Jira instances: the subtask issue type name, the project management style, and the Epic Link custom field ID. This eliminates manual configuration when switching between test (Sunstone) and production (SearchStax) environments.
# Auto-detect subtask issue type name
# (Sunstone uses 'Subtask', SearchStax uses 'Sub-task')
issue_types = jira.get(f"/rest/api/3/issue/createmeta/...")
subtask_name = next(
t['name'] for t in issue_types
if t['name'].lower().replace('-', '') == 'subtask'
)
# Auto-detect project style
# (next-gen = team-managed, classic = company-managed)
project = jira.get(f"/rest/api/3/project/{PROJECT_KEY}")
project_style = project.get('style', 'classic')
# Auto-detect Epic Link custom field ID
# (varies per instance: customfield_10014, customfield_10600)
if project_style == 'classic':
fields = jira.get("/rest/api/3/field")
epic_link_field = next(
f['id'] for f in fields
if f['name'] == 'Epic Link'
)
# Upload phases with proper hierarchy:
# Phase 1: Create Epics (root cause themes)
# Phase 2: Create Tasks linked to parent Epics
# Phase 3: Create Sub-tasks linked to parent Tasks| Property | Detail |
|---|---|
| Subtask Name Detection | Handles 'Subtask' vs 'Sub-task' naming — a common Jira compatibility issue that causes silent upload failures |
| Project Style Detection | next-gen (team-managed) vs classic (company-managed) determines how Epic linking works |
| Epic Link Field Detection | Custom field ID for Epic Link varies per instance — auto-detected from the field metadata API |
| Three-Phase Upload | Epics first, then Tasks with Epic linkage, then Sub-tasks with Task linkage — order enforced for parent ID resolution |
Data Lifecycle
End-to-end flow of a single compliance check through the pipeline. Every arrow is a single NDJSON file. Every stage enforces a schema gate and count invariant before writing its output.
Data Preparation — Objective Validation + Component Examples
Validates Rev5 800-53 objective questions against the KSI Golden Template and generates Component Examples — concrete descriptions of what 'meeting this objective' looks like for each control. GPT-5 with 5 parallel workers processes the full question set.
Gap Analysis — Control-to-Tech-Stack Mapping
The core analysis stage. GPT-5 receives each KSI × Control × Part combination alongside the client's technology stack and determines whether the client's infrastructure satisfies the control requirement. A second pass using GPT-4.1 Mini extracts structured fields from GPT-5's prose into a clean CSV. This is where the dual-model architecture matters most — GPT-5 needs to understand compliance semantics to make the mapping judgment, but extracting 'match_type: Direct' from the response is a mechanical task.
Vanta Integration — Live Test Inventory + NIST Mapping
Pulls the client's live Vanta test inventory via API, exports it as a structured CSV, then uses GPT-5 to map each Vanta test to the NIST 800-53 controls it covers. This creates the bridge between 'what Vanta is testing' and 'what FedRAMP requires.'
Merge + Scorecard — Deterministic Verdict Logic
Joins the gap analysis output (Part 2) with the Vanta mapping (Part 3) and inventory data to produce the master compliance scorecard. Verdict logic is entirely deterministic: COMPLIANT = AI Direct match + Vanta passing; CRITICAL GAP = Vanta test actively failing; STRATEGIC GAP = no tool exists + no Vanta test; LIKELY COMPLIANT = AI Direct match but no Vanta test to confirm; PARTIAL = everything else.
Custom Tests + Strategic Gaps
Filters the scorecard for controls lacking Vanta coverage. Splits into two categories: PROMPT_1 (client has the tool but no Vanta test — generate a custom test definition) and PROMPT_2 (client is missing the tool entirely — generate a strategic gap remediation plan). GPT-5 generates both.
Remediation Master Guide — Root Cause Grouping
Takes all critical gap failures (with Vanta failure details) and sends batches of ~30 failures to GPT-5 for root cause grouping and CLI fix generation. A final master analysis pass produces an Executive Remediation Roadmap with 8-10 themes and a 'Top 5 Power Fixes' — the highest-leverage remediations that resolve the most failures.
Vanta UI Mapping — Configuration Manifest
For controls that do have Vanta coverage, generates a configuration manifest showing exactly how each KSI maps to Vanta's UI — which tests to enable, which settings to configure, and current coverage status.
Jira Ticket Pipeline — Build, Format, Upload
Three sub-stages that transform the remediation guide into actionable Jira tickets. 8a builds the Epic + Task hierarchy CSV. 8b rewrites descriptions to audit-ready format (Definition of Done, Architectural Directive), scrubs non-AWS references, splits Tasks into 3 Sub-tasks each, and auto-generates labels. 8c uploads to Jira via REST API with auto-detection of instance configuration. The upload is crash-safe — every created ticket is logged immediately, and re-runs skip already-uploaded tickets.