Skip to content
AI Engineering AA-009

KalmSkills — Career Intelligence Platform

Full-stack career intelligence platform integrating O*NET occupational data (1,016 occupations, 35K+ skill descriptors), SEC EDGAR employer signals, and BLS wage statistics — with cosine-similarity skill matching, ATS resume simulation, and sub-100ms multi-dimensional queries.

01 — Problem

Career Platforms That Show Listings Without Context

Every job board I encountered presented the same impoverished view: a title, a company name, a list of requirements, and a salary range. None of them could answer the questions that actually matter to someone navigating a career transition: which of my existing skills transfer to this occupation? What’s the gap between what I know and what the market values? Which employers in my region are actively investing in workforce development, and how do I prove my candidacy against their ATS filters?

I needed a platform that treated career navigation as an intelligence problem — one that synthesized federal occupational data, employer filing signals, and wage statistics into a unified decision surface. Not another job board. A career reasoning engine.

02 — Architecture

Three Federal APIs, One Decision Surface

The platform integrates three federal data sources through a FastAPI backend with PostgreSQL persistence and a React/Vite frontend:

O*NET Occupational Intelligence

Ingests 1,016 occupations with 35,000+ skill descriptors, knowledge domains, and ability ratings from the O*NET database. Each occupation is decomposed into a weighted skill vector, enabling cosine similarity matching between a user’s self-reported skills and any target occupation. The skill-gap analysis shows exactly which competencies are missing and their relative importance to the role.

SEC EDGAR Employer Signals

Parses 10-K and 10-Q filings for workforce development language — tuition reimbursement programs, training expenditure disclosures, and education benefit references. This surfaces employers who are actively investing in talent development, a signal that’s invisible on job boards but highly relevant to career decision-making.

BLS Wage and Employment Statistics

Integrates Bureau of Labor Statistics data for median wages, employment counts, and growth projections by occupation and geography. This contextualizes the skill-gap analysis with economic reality: a 90% skill match to an occupation paying below your current wage is a different decision than the same match to one paying 40% more.

ATS Simulation Engine

Parses uploaded resumes and scores them against target occupation requirements using the same keyword-matching and section-weighting heuristics that real ATS systems employ. The simulation reveals why a qualified candidate might be filtered out: missing keywords, poor section structure, or skills described in non-standard language.

Key Design Decisions

Why PostgreSQL after starting with JSON file storage? The initial prototype stored O*NET data as flat JSON files. This worked for single-occupation lookups but collapsed under multi-dimensional queries like “find all occupations requiring Python + data analysis + communication skills within the top 25% wage bracket.” PostgreSQL with proper indexing reduced these queries from 12 seconds to 80ms. The migration cost 3 days but eliminated the performance ceiling entirely.

Why cosine similarity for skill matching instead of keyword overlap? Keyword matching treats “data analysis” and “statistical analysis” as unrelated terms. Embedding-based cosine similarity captures the semantic relationship, producing more relevant occupation matches for users who describe their skills in non-standard language.

03 — Outcomes

Measured Results

1,016
Occupations Indexed

from O*NET with full skill decomposition vectors

35K+
Skill Descriptors

mapped across knowledge, abilities, and work activities

3
Federal Data APIs

O*NET, SEC EDGAR, and BLS integrated into unified queries

80ms
Multi-Dimension Query

down from 12s after migrating from JSON to PostgreSQL

04 — Reflection

The Gap Between Skills Possessed and Skills Perceived

The most surprising finding from building the ATS simulation was how poorly qualified candidates present themselves in resume format. Users with 80%+ skill overlap to a target occupation would score below 50% on the ATS simulation because their resumes used different terminology, buried key skills in prose paragraphs, or omitted standard section headers. The platform’s real value turned out to be translation — helping users express what they already know in the language that hiring systems expect.

What I’d change: the SEC EDGAR integration currently runs as a batch job that refreshes quarterly. Converting it to an event-driven pipeline (triggered by new filing notifications) would keep the employer intelligence layer current rather than 3 months stale. The infrastructure for this exists in the Real-Time Analytics Pipeline project — it’s a matter of connecting the two systems.

“Career intelligence isn’t about knowing what jobs exist. It’s about understanding the distance between where you are and where the market needs you to be — and having the data to close that gap deliberately.”

Outcomes

1,016 occupations indexed with skill vectors; 35K+ skill descriptors mapped; 3 federal data APIs integrated; Query time reduced from 12s to 80ms after PostgreSQL migration