RAG as Data Infrastructure, Not Feature
Redesigning RAG as foundational data infrastructure reduced per-query costs 75% and improved answer accuracy from 67% to 91% across 2.3 million monthly queries.
Portfolio
Systems built, documented, and reflected upon.
Redesigning RAG as foundational data infrastructure reduced per-query costs 75% and improved answer accuracy from 67% to 91% across 2.3 million monthly queries.
Data Systems
The SEC Filing Intelligence Pipeline processes 36,791 filings from EDGAR, extracting structured intelligence through a 5-stage Python ETL architecture. The system reduced per-filing analysis by 73% while detecting anomalies manual review missed.
Automation
Python tool that mines 3 years of ChatGPT exports (1,195 conversations, 27,689 messages) for portfolio-worthy project candidates through heuristic scoring, 5-theme classification, and privacy-aware sensitive data detection — processing the complete archive in 23 seconds with zero external dependencies.
Business Intelligence
FastAPI-powered workforce education dashboard computing 7 financial KPIs across programs and cohorts — with an API-first architecture that serves the HTML dashboard, enables future integrations, and maintains 100% test coverage via pytest and httpx.
Automation
Semi-automated job application pipeline with 5 stages and 3 human decision gates — Playwright scrapes postings across ATS platforms, analyzes requirements against a master profile, tailors resumes and cover letters, exports to 4 formats, and navigates application forms without auto-submitting.
Full-Stack
Drag-and-drop web app that transforms Excel files into interactive KPI dashboards entirely client-side — with auto-KPI detection, chart type auto-suggestion, and zero server-side data processing for privacy-sensitive workforce education data.
Automation
Automated converter that transforms 847 Google Keep exports into structured Obsidian markdown with YAML frontmatter, 5-category auto-classification, MD5 deduplication (18.4% duplicate rate), and bidirectional wikilink generation — packaged as a standalone .exe for zero-dependency distribution.
Automation
CLI-driven HTML email generator that separates content (YAML) from formatting (Jinja2 templates) with automated CSS inlining via Premailer and minification — reducing campaign email production from 25 minutes to under 2 minutes with guaranteed cross-client rendering.
Data Engineering
End-to-end intelligence suite that scans 36,791 SEC EDGAR filings to identify companies offering §127 education benefits — with XBRL validation that reduced a 58.2% false positive rate by 90.3%, geographic enrichment at 90.1% coverage, and per-company deal value computation across the public company universe.
AI Engineering
Full-stack career intelligence platform integrating O*NET occupational data (1,016 occupations, 35K+ skill descriptors), SEC EDGAR employer signals, and BLS wage statistics — with cosine-similarity skill matching, ATS resume simulation, and sub-100ms multi-dimensional queries.