{"api_version": "0.15.2-beta", "external_version": "0.15", "last_updated": "2026-07-13", "status": "beta", "changelog": [{"version": "0.15.2-beta", "date": "2026-07-13", "type": "fix", "breaking": false, "summary": "TITLECONVERT database (fusion) path: removed the ~0.50 match_score plateau and made scores 1.0-anchored like the fast path. Multi-word queries with no exact survey phrase and no single lead term (post\u22480, lead=0) previously capped their top result at score\u22480.50 (\u03b1*lex with \u03b1=0.5, lex=1.0), producing noisy, non-comparable intra-cluster ranking. Fusion now scores the leading-token bonus by the FRACTION of query tokens present as whole words in the O*NET title, then renormalizes the gated survivors so the leader anchors at 1.0. Confidence still reads the absolute fused score so a weak-but-best leader is not inflated.", "details": ["Surface: src/unified_data_handler.py _fuse_titleconvert_scores.", "Lead scoring: replaced the single-phrase prefix/word-boundary bonus with per-token coverage \u2014 lead_term = LEAD_BONUS * (matched query tokens / total query tokens), whole-word matched against the O*NET title (so 'project' cannot credit 'Projectionists', and multi-word taxonomy names credit member occupations, e.g. 'mechanical & robotics engineers' -> 'Mechanical Engineers' 2/3 > 'Mechatronics' 1/3).", "Tokenization: alnum tokens, stopwords dropped, trailing-s folded (nurse==nurses) so plural typed queries match singular titles and vice versa.", "Renormalization: gate on the absolute fused score (unchanged TITLE_FUSION_MIN_SCORE intent), then divide survivors by the max fused score so the leader = 1.0, matching the fast path's 1.0-anchored scale and killing the \u03b1=0.5 ceiling.", "Validation: golden eval (tests/eval/titleconvert_golden.jsonl) +3/+3 on staging vs baseline, 0 regressions after tokenization refinement (student nurse, technology architect); 7-name plateau spot-check confirmed 1.0-anchored, monotonic intra-cluster ordering."], "migration": "None \u2014 scoring refinement on the existing database/fusion path. Response shape unchanged (match_score/lexical_calibrated/confidence). Fast typeahead (name_corpus) path is unaffected."}, {"version": "0.15.1-beta", "date": "2026-07-01", "type": "feature", "breaking": false, "summary": "TITLECONVERT gains a batch / list form: TITLECONVERT [\"a\", \"b\", \"c\"] resolves up to 200 job titles in ONE DSL call instead of one call per title. The batch runs Tier-1 FULLTEXT + Tier-2 pgvector RAG only and SKIPS Tier-3 LLM disambiguation (the per-title cost driver), so large lists stay fast. Each input title returns one result group (input order + duplicates preserved) carrying an `ambiguous` flag; callers re-run only the ambiguous items individually for full LLM disambiguation. DSL-only (exposed via the aoi_dsl_query MCP tool whose description teaches the pattern) \u2014 no new MCP tool, no REST mirror.", "details": ["DSL: TITLECONVERT [\"nurse\", \"welder\"] [WHERE COMPANY IS \"...\" | WHERE INDUSTRY IS \"...\"]. A single shared WHERE applies to every title; per-row context is not supported in the list form (convert per-row-context titles individually).", "Parser: src/dsl/ast_parser.py _parse_convert detects a leading [ ... ] for the TITLE subject only and populates QueryAST.targets; single-title path is unchanged.", "Executor: src/dsl/ast_executor.py _exec_titleconvert routes ast.targets to UnifiedDataHandler.titleconvert_batch().", "Handler: UnifiedDataHandler.titleconvert_batch() validates a 200-title cap (-32602 if exceeded), de-duplicates case-insensitively, fans out with bounded concurrency (semaphore, 8) over the single-title core with skip_llm=True, and returns {results:[{title, recommended, matches, ambiguous}], count, unique_count, ambiguous_count, skipped_llm:true, guidance}.", "Disambiguation: src/disambiguation_service.py disambiguate() gains skip_llm; when set it bypasses the Tier-3 LLM block and marks unresolved cases ambiguous (signal 'llm_skipped'). Threaded through UnifiedDataHandler.titleconvert()/_apply_disambiguation().", "Docs: HELP CONVERT (docs/context/convert.md) gains a 'Batch / list form' section; grammar_generator and the aoi_dsl_query tool overview list the new form."], "migration": "None \u2014 additive. The single-title TITLECONVERT \"x\" form is unchanged."}, {"version": "0.15.0-beta", "date": "2026-06-27", "type": "feature", "breaking": false, "summary": "Fast typeahead mode (default ON): on the no-company/typeahead path, TITLECONVERT resolves the typed (partial) title against the survey NAME corpus (title_probability, ~54k distinct names) via a simple-config WORD-PREFIX match, instead of the 140k-row company title-blob FTS. ~100-1000x faster (DB 2-40ms vs 5-9s), junk-free, and it fixes the stopword 'it'->0-rows and 'registered nurse'->1-row cases. Multi-word phrases that don't resolve cleanly defer to the legacy pipeline. No grammar/schema/response-shape change; reversible.", "details": ["Why: ranking ts_rank over 140k company title-blobs for common prefixes (man:*, eng:*) cost 5-9s and let junk infiltrate (a blob merely CONTAINING the token scored like a real match) -> 'manager' surfaced Audio/Video Technicians, Lawyers, Cashiers; 'it' returned 0 rows (english-FTS stopword); 'registered nurse' collapsed to 1 row.", "Matcher: simple-config word-prefix tsquery (to_tsvector('simple',title) @@ to_tsquery('simple','manager:*')) so each token matches a word STARTING with it ANYWHERE in the name (manager->store manager, marketing->digital marketing specialist, it->it technician \u2014 no stopword drop, no stem collision); the last token also ORs its singular so a typed plural matches singular names. Ranking: mass = Sum(probability*occupation_share) per O*NET (kappa-smoothed), normalized-to-max within the distinct-occupation set, no min_score gate. Bypasses fusion + LLM disambiguation (typeahead must stay sub-100ms).", "Multi-word guard: trigram typo-fallback is single-word-only (on phrases it matched 'stock room workers'~'dock worker'); a multi-word query the corpus resolves to <=1 occupation returns [] so the caller falls through to the legacy pipeline (digital/agile project manager->Project Management Specialists, stock room workers->Stockers and Order Fillers).", "Performance: GIN index idx_tp_title_simple_fts (sql/016_title_probability_simple_fts_2026-06-27.sql) applied CONCURRENTLY to aoi_data_v6 -> DB time 2-40ms (vs 5-9s legacy).", "Validation (staging code-canary on live prod data aoi_data_v6): recall-set (1000 high-prob names) +3.0-4.4 pts recall, 0 regressions; adjudicated golden (148, independent of title_probability) parity at 98.0% with a single residual regression (hr analyst->human resources specialist, a labeling judgment) after the multi-word guard cut mechanical regressions 4->1.", "Surfaces: src/unified_data_handler.py (_titleconvert_fast, _build_typeahead_tsquery, fast branch, env knobs), src/hybrid_server_mcp.py (?_fast), src/dsl/ast_parser.py (USING FAST), sql/016_*, scripts/check_titleconvert_recall.py (--fast A/B lever)."], "migration": "None \u2014 behavioral default change on the no-company path, no grammar/schema/response-shape change (flat per-row schema preserved, match_source=name_corpus). Toggle per request via ?_fast=true|false (REST) / USING [NO] FAST (DSL); globally TITLE_FAST_TYPEAHEAD=0 restores the legacy blob path. Company-scoped and explicit GROUP BY ONET calls are unchanged. Requires GIN index idx_tp_title_simple_fts on title_probability (sql/016) \u2014 already applied to aoi_data_v6."}, {"version": "0.14.8-beta", "date": "2026-06-27", "type": "fix", "breaking": false, "summary": "TITLECONVERT dedups by O*NET by default on the no-company (title->occupation resolution / typeahead) path, so the result is DISTINCT occupations instead of 10 company-rows of one dominant occupation. This is the actual fix for the live 'typing manager/engineer/nurse returns one option' complaint: the production client (WYWM WordPress typeahead via lambda-proxy) calls /api/titleconvert WITHOUT group=onet, the path on which the ungrouped response returns one row per company. No grammar/schema/response-shape change; reversible.", "details": ["Root cause: ungrouped TITLECONVERT returns one row PER COMPANY. A dominant occupation (Engineers/Managers 'All Other' at fused match_score 1.0) has many companies whose rows fill the entire LIMIT=10 window -> caller sees ONE distinct occupation. 'nurse' showed only Nurse Practitioners + Nurse Midwives (Registered Nurses pushed off the window).", "Why prior validation missed it: the GROUP BY ONET option already fixed this, and all fusion/0.14.7 A/B used group=onet. But the live client (lambda-proxy -> GET /api/titleconvert?title=...&expand=cluster) never sends group=onet, so it always saw the collapsed list.", "Fix (src/unified_data_handler.py titleconvert): when no company context AND grouping not requested, collapse to the best-scored row per O*NET (results already sorted desc) so LIMIT counts distinct occupations. Flat per-row schema preserved (company_name=top company, match_score unchanged) -- NOT the grouped {companies:[],top_score} shape -- so existing clients only lose duplicate-occupation rows.", "Verified on prod data: engineer 1->10 distinct, manager 1->10, nurse 2->full family including Registered Nurses. Company-scoped calls and explicit GROUP BY ONET unchanged."], "migration": "None \u2014 behavioral default change on the no-company path, no grammar/schema/response-shape change. Reversible: TITLE_DEDUP_ONET_DEFAULT=0 restores legacy per-company-row output. Clients that explicitly pass a company or GROUP BY ONET are unaffected."}, {"version": "0.14.7-beta", "date": "2026-06-27", "type": "fix", "breaking": false, "summary": "Smooth the TITLECONVERT survey-prior posterior so a single low-evidence survey match can no longer manufacture a posterior of 1.0 and bury the exact/leading occupation the user typed (e.g. 'registered' was surfacing Medical Registrars over Registered Nurses). The Bayes posterior P(occupation|title) is normalized across only the occupations sharing the query's stem-key; when exactly one matches it captures the entire mass -> 1.0 no matter how tiny the evidence. Additive (Dirichlet) smoothing fixes it. No grammar/schema/response-shape change; reversible.", "details": ["Root cause (src/unified_data_handler.py _lookup_title_prior): posterior = p_i*s_i / Sum(p_j*s_j) over only the stem-key-matching occupations. A single match normalizes to 1.0 even when p*s is noise. On prod: 'registered' (stem 'regist') matched one survey title 'register' (p*s ~= 1e-6) coded to Medical Registrars, injecting beta*1.0 = 0.5 and burying Registered Nurses (surveyed as the 2-token 'registered nurse' -> stem {nurs,regist}, invisible to the bare 'regist' lookup). Same shape: 'roofer'->Helpers--Roofers, 'nanny'->Childcare Workers, 'neuropsychologist'/'ophthalmologist'->the All Other catch-all.", "Fix: additive smoothing denom = Sum(p*s) + kappa (new TITLE_PRIOR_KAPPA env, default 1e-4). Mass >> kappa is ~unchanged (nurse family keeps RN ~0.46 and still leads); mass << kappa collapses toward 0 (1e-6 noise -> ~0.01).", "Validation (read-only, staging code-canary on live prod data aoi_data_v6): leading-word trap sweep scripts/sweep_titleconvert_leading.py 76->26 flagged (cleared ~50 incl. registered, ophthalmologists, neuropsychologists, surgeons, arbitrators, farmers, general, pharmacy). Reported-cases recall (tests/eval/titleconvert_reported_cases.jsonl, n=183, GROUP BY ONET) non-regressing vs prod: complaint 94.7%=94.7%, history 94.8%->98.0% (+3.2), trace 90.9%=90.9%, prefix 100%=100%.", "New sweep harness scripts/sweep_titleconvert_leading.py; scripts/check_titleconvert_recall.py gains transient-retry resilience so a cold/loaded staging spot instance can't be miscounted as recall misses.", "Residual ~9-10 leading-word stragglers where a sibling with genuine survey mass still edges out the exact occupation (roofers/nannies/cabinetmakers/agents/assemblers/buyers/cleaners/counselors/teachers/directors) are tracked for a follow-up lead-priority tiebreak (separate lever)."], "migration": "None \u2014 behavioral default change, no grammar/schema/response-shape change. Reversible: TITLE_PRIOR_KAPPA=0 restores the legacy un-smoothed normalization. Tunable via the same env var (larger kappa = more aggressive damping of thinly-supported priors)."}, {"version": "0.14.6-beta", "date": "2026-06-27", "type": "fix", "breaking": false, "summary": "Add a LEADING-TOKEN bonus to the fusion scorer so the occupation whose title STARTS WITH the typed word reliably surfaces (and stays surfaced through typeahead). Follow-up to the 0.14.5 fusion promotion: fusion's exact-match weight defaults to 0.0 and its only exact signal was a crude substring test (query in raw_titles) that fires for prefix-into-a-longer-word, so 'project' matched the substring inside 'Motion Picture Projectionists' and outranked 'Project Management Specialists' (buried at #8). No grammar/schema/response-shape change; reversible.", "details": ["New TITLE_FUSION_LEAD_BONUS (env, default 0.5) in src/unified_data_handler.py _fuse_titleconvert_scores. Word-boundary-aware additive term against the candidate onet_title: onet_title == query -> 1.0*bonus; onet_title startswith query -> 0.9*bonus; query is a whole word in title -> 0.5*bonus. Distinguishes 'project' (matches 'Project Management Specialists', NOT the substring in 'Motion Picture Projectionists').", "Root cause: under fusion TITLE_FUSION_EXACT_BONUS defaults to 0.0 (no boost) and the substring exact check could not separate a leading word match from a longer-word prefix match; calibrated ts_rank then let the noisier longer-word match win.", "Effect on prod data: 'project' ranks Project Management Specialists #1 (was #8); 'patient' -> Patient Representatives stays #1 and is stable across typeahead prefixes (pati/patie/patien) instead of bouncing to #7-#9; multi-occupation heads (financial, marketing) start-with-match and surface together.", "Non-leading queries untouched (word-boundary): 'nurse' -> Registered Nurses still resolves via survey posterior with no spurious lead bonus."], "migration": "None \u2014 behavioral default change, no grammar/schema/response-shape change. Reversible: TITLE_FUSION_LEAD_BONUS=0 disables the leading-token bonus. Tunable via the same env var."}, {"version": "0.14.5-beta", "date": "2026-06-27", "type": "fix", "breaking": false, "summary": "Promote TITLECONVERT FUSION RANKING to default ON (TITLE_FUSION_RANKING default 0 -> 1). Fixes the post-0.14.3 'typing a word only returns ONE option' regression on the default ungrouped path. Legacy scoring divided every ts_rank by the single top raw score (normalize-to-max) then applied an ABSOLUTE min_score=0.5 gate, so sibling occupations under a dominant leader (e.g. Market Research Analysts beneath Marketing Managers) normalized below 0.5 and were CULLED. Fusion blends calibrated lexical + survey prior on a common scale and gates on TITLE_FUSION_MIN_SCORE (0.05), so siblings survive and the prior re-ranks as a peer term. No grammar/schema/response-shape change; fully reversible.", "details": ["Default flip (src/unified_data_handler.py): TITLE_FUSION_ENABLED now reads TITLE_FUSION_RANKING default '1' (was '0'). score = alpha*calibrate(ts_rank) + beta*survey_posterior (+ exact term); alpha=beta=0.5, percentile lexical calibration. When fusion is active the legacy normalize-to-max min_score=0.5 gate is skipped and gating uses TITLE_FUSION_MIN_SCORE=0.05.", "Root cause of the regression: normalize-to-max produced a relative score (raw/max_raw) that was then filtered by an absolute 0.5 threshold; a dominant occupation pushed valid different-occupation siblings below 0.5 (culled) and its many company rows filled the LIMIT window \u2014 collapsing the list to one occupation on the ungrouped path the contractor uses.", "Validation (read-only A/B vs aonav.ai, tests/eval/titleconvert_reported_cases.jsonl, n=183 = complaint terms + LLM-adjudicated real-query GOLD + production-log traces): recall (expected occupation in top-10) 89.1% -> 93.4% (+4.4 pts), variety (distinct occupations shown) 8.09 -> 8.38, ZERO regressions; clean gains on 'project' and 'teacher'.", "Fusion and the (still default-OFF) ORS catalog matcher independently reach the same 93.4% ceiling. Residual misses are scoring-invariant: 'it' is an english-FTS stopword (0 rows; needs the separate stopword-exception lever) and a few are golden-label artifacts.", "New eval harness: scripts/build_titleconvert_reported_cases.py, scripts/check_titleconvert_recall.py (--catalog / --fusion A/B modes), tests/eval/titleconvert_reported_cases.jsonl."], "migration": "None \u2014 behavioral default change, no grammar/schema/response-shape change. Reversible: set TITLE_FUSION_RANKING=0 to fall back to legacy normalize-to-max scoring. Per-request override still honored: REST ?_fusion=true|false, DSL TITLECONVERT \"x\" USING [NO] FUSION. Note for clients: under fusion the REST min_score=0.5 default is no longer applied as a hard cutoff (gating moves to the 0.05 fused floor), so result sets are wider and match_score values sit on the fused scale (heads ~0.9+, ranked tail ~0.4-0.55) rather than the legacy 0/1 normalize-to-max scale."}, {"version": "0.14.4-beta", "date": "2026-06-22", "type": "feature", "breaking": false, "summary": "Make the survey prior STEM-CONSISTENT, let a dominant prior arbitrate ambiguous TITLECONVERT, and add minimal semantic-cache invalidation. Together these fix nursing -> Registered Nurses (was Nursing Assistants): stemming put RN in the candidate pool (0.14.3), but the literal prior lookup found no 'nursing' row and the disambiguation semantic cache kept voting the wrong learned answer. No grammar/schema/response-shape change; additive + behavioral.", "details": ["Stem-consistent prior (src/unified_data_handler.py _lookup_title_prior): when stemming is active, key title_probability on the english stem (strip(to_tsvector('english'::regconfig, title))::text) and AGGREGATE probability + occupation_share across the whole stem family, recomputing the posterior \u2014 so 'nursing' inherits the nurse/nurses prior. Backed by functional index sql/015_title_probability_stemkey_2026-06-22.sql (IMMUTABLE-safe); empty stem-key falls back to exact lookup.", "Authoritative survey prior (src/disambiguation_service.py): new Tier-2a-prior \u2014 after de-duping candidates by O*NET, if the top occupation's survey_posterior >= SURVEY_AUTHORITY_MIN (0.5) AND >= SURVEY_AUTHORITY_DOMINANCE (1.5x) the runner-up O*NET, the prior is authoritative: it sets the tier-2 recommendation and SKIPS the historical_pattern (semantic-cache) vote. The per-O*NET dedup is essential \u2014 the ungrouped candidate list repeats an occupation once per company, which otherwise compared an occupation against itself.", "Semantic-cache epoch invalidation (src/pgvector_client.py, src/seed_pgvector.py): every learned/seeded title_pattern_vectors row is stamped with metadata.epoch (env RAG_CACHE_EPOCH, default '1'); reads gate on COALESCE(metadata->>'epoch','1') = epoch. Bumping RAG_CACHE_EPOCH + re-seeding logically invalidates ALL prior passively-learned patterns at once (for prior/scoring/embedding-model changes). COALESCE default keeps pre-epoch rows valid until the first bump \u2014 no cold cache, no backfill.", "Auto-evict on authoritative disagreement (src/pgvector_client.py): when a resolution is decided by the dominant survey prior (source='survey_prior'), the upsert also DELETEs stale non-'seed' rows for that title pointing elsewhere \u2014 a self-reinforced wrong pattern is removed (not just out-voted) and the corrected pick re-stored (self-heal). Deferred to ORS P2/P4: context_sig keying, memo->confirmed grades, TTL/decay, manual evict API.", "Validation (staging code-canary): nursing -> Registered Nurses via survey_prior, stable, cache self-heals to a single RN row; regression harness 0 regressions / 0 suppressions across 21 anchors; 25-intent harvest STEM OFF->ON 23/25 unchanged (matches the 0.14.3 pass \u2014 data-neutral); deterministic prod-vs-staging deltas confined to nursing (fix) and content creator (survey-prior picks 27-3099 catch-all over 27-3043 Writers)."], "migration": "None for grammar/schema/response shape. The sql/015 functional index must exist on any database serving stemmed prior lookups (already applied to aoi_data_v6 + aoi_data_staging); apply before serving stemmed traffic on any other clone/restore (regression-13 family). New env RAG_CACHE_EPOCH (default '1') controls semantic-cache invalidation: to invalidate all learned patterns after a prior/model/scoring change, bump it and re-run scripts/seed_pgvector.py --all (seeds carry the new epoch). No DB migration required to deploy the code."}, {"version": "0.14.3-beta", "date": "2026-06-22", "type": "feature", "breaking": false, "summary": "TITLECONVERT english (Snowball) FTS stemming is now ON by default on the legacy title_conversion corpus path (match_source=database). The corpus FTS previously used the `simple` config, making surface-form variants disjoint token pools so an exact-surface hit could SUPPRESS the right occupation entirely (nursing -> only Nursing Assistants with RN absent; writing -> English Teachers; driving -> CTE Teachers). The `english` config stems nurse/nursing/nurses -> 'nurs' so sibling forms share ONE candidate pool and the survey prior / disambiguation arbitrates within it. The catalog matcher already had its own stemmed tier; this only changes the legacy corpus default.", "details": ["Default flip (src/unified_data_handler.py): TITLE_FTS_STEM default 0 -> 1. Per-call override unchanged: DSL `TITLECONVERT \"x\" USING NO STEM`, REST `?_stem=false`; global kill-switch `TITLE_FTS_STEM=0`.", "Prereq index (sql/014_titleconvert_english_fts_index_2026-06-22.sql): functional gin(to_tsvector('english', COALESCE(top_10_titles,''))) built CONCURRENTLY on aoi_data_v6 (prod) and aoi_data_staging (clone) as aoi_admin. Without it stemmed queries seq-scan ~10-16s; with it the planner uses a Bitmap Index Scan (~10ms).", "Anti-regression: exact-surface bonus (TITLE_EXACT_BONUS, default 0.2 / DSL USING EXACT_BONUS= / REST ?_exact_bonus=) keeps clean exact cases (barista, welder, 'registered nurse') on top; pre-dedup fetch window widened max(limit*5,100) -> max(limit*8,200) so higher stemmed recall doesn't truncate the OFF-top survivor before dedup.", "FTS helpers (src/db_dialect.py): fulltext_match / fulltext_score_alias accept an optional config= param (defaults to the 'simple' global); english is passed only when stemming is active.", "Validation (staging code-canary on prod data): 0 anchor regressions across 21 known-correct intents; 21/21 correct at every EXACT_BONUS 0.0-0.3; full real-traffic day = 23/25 distinct queries unchanged OFF->ON (the 2 that moved were AI/ML titles with no canonical O*NET, both lateral/improved)."], "migration": "None \u2014 behavioral default change, no grammar/schema/response-shape change. The sql/014 functional GIN index must exist on any database serving stemmed queries (already applied to aoi_data_v6 + aoi_data_staging); apply it before enabling stemming on any other clone/restore (regression-13 family). Disable globally with TITLE_FTS_STEM=0 or per-call with USING NO STEM / ?_stem=false."}, {"version": "0.14.2-beta", "date": "2026-06-19", "type": "chore", "breaking": true, "summary": "Collapse the MCP retrieval surface to a SINGLE tool, aoi_dsl_query. The dedicated aoi_title_convert and aoi_company_convert MCP tools were REMOVED from tools/list and from execution. They duplicated DSL's TITLECONVERT / COMPANYCONVERT commands and gave models a second door that fragmented the agent loop: the dedicated tool returns an O*NET, then the model must switch to DSL anyway for LIST COMPANIES / wages / pathways, and the tools never composed with EXPAND CLUSTER / WHERE / GROUP BY / the new USING flags. DSL is the superset; resolution is now expressed inline (COMPANYCONVERT \"MSFT\"; TITLECONVERT \"nurse\" EXPAND CLUSTER; TITLECONVERT \"manager\" WHERE COMPANY IS \"McDonalds\"). Clients discover the surface dynamically via tools/list, so no compatibility shim is provided (owner direction: no legacy client support). TITLECONVERT still returns agent_guidance / llm_recommendation / recommended inline \u2014 no capability is lost, only the redundant tool.", "details": ["Tool defs (src/mcp/tool_definitions.py): ALL_TOOLS = [DSL_QUERY_TOOL]; COMPANY_CONVERT_TOOL and TITLE_CONVERT_TOOL deleted.", "Registry (src/mcp/tools_registry_unified.py): call_tool now serves only aoi_dsl_query (else -> -32601 pointing at DSL TITLECONVERT/COMPANYCONVERT); _execute_title_convert_tool / _execute_company_convert_tool deleted; capabilities tools.count 2->1 and feature list updated.", "CLI/stdio (src/mcp_stdio_server.py): tools/list returns only aoi_dsl_query (from get_tool_definitions); the dead aoi_title_convert/aoi_company_convert dispatch branches removed. The CLI was already DSL-backed (every call -> /api/dsl), so DSL retrieval over stdio is unchanged.", "Bedrock bridge (src/hybrid_server_mcp.py): the injected tool-use prompt now steers to TITLECONVERT \"<title>\" via aoi_dsl_query (was 'call aoi_title_convert first'); _canonical_aoi_tool_name known_tools and the bridge tool allowlist reduced to ('aoi_dsl_query',).", "NL context (src/mcp/ollama_nl_tool.py): _get_dynamic_context no longer keys the company-conversion guidance off the removed tool \u2014 it always emits the DSL TITLECONVERT/COMPANYCONVERT guidance.", "Docs: API_REFERENCE.md (tool-surface table + worked example now DSL), MCP_CONTEXT.md (two Tool: sections replaced with a single aoi_dsl_query note), docs/context/convert.md (MCP-tool-alternative blocks replaced with DSL guidance), ORS proposal updated."], "migration": "MCP clients that called aoi_title_convert or aoi_company_convert directly MUST switch to aoi_dsl_query with a DSL command: TITLECONVERT \"<title>\" [EXPAND CLUSTER] [WHERE COMPANY IS \"<name>\"] and COMPANYCONVERT \"<name>\" [WHERE INDUSTRY IS \"<sector>\"] [LIMIT N]. tools/list reflects the single tool; agents that discover tools dynamically need no change. No DB or REST/DSL grammar change."}, {"version": "0.14.1-beta", "date": "2026-06-19", "type": "feature", "breaking": false, "summary": "DSL parity for the experimental TITLECONVERT matcher toggles. DSL is the superset of REST, but the catalog/fusion/fts/bypass knobs had been exposed only as REST query params (?_catalog, ?_fusion, ?_fts_and, ?_bypass) and were unreachable from DSL/Chat. Added a `USING <flags>` clause to TITLECONVERT so the same toggles are expressible in DSL: bare flag = ON (USING CATALOG), `NO <flag>` = OFF (USING NO CATALOG), explicit form (USING CATALOG=false), comma-separated for multiple, composes with EXPAND CLUSTER / GROUP BY / LIMIT. Absent => server env default (unchanged behavior).", "details": ["Grammar (src/dsl/ast_parser.py): _extract_modifiers gains _extract_using; WHERE / EXPAND / ORDER BY parsing now terminate at USING. Flag map: CATALOG->_catalog, FUSION->_fusion, BYPASS->_bypass, FTS_AND->_fts_and (bool), FUSION_ALPHA/BETA/EXACT (float), FUSION_LEX_MODE (str). Unknown flags are ignored (forgiving, like unknown EXPAND tokens).", "AST (src/dsl/ast_nodes.py): Modifiers gains a `using` dict field.", "Execution (src/dsl/ast_executor.py): _exec_titleconvert merges modifiers.using into the titleconvert() kwargs \u2014 1:1 with the REST params, so REST and DSL now share one tri-state surface.", "Docs (docs/mcp/MCP_CONTEXT.md): testing guide documents the DSL USING surface alongside the REST params."], "migration": "None \u2014 additive, experimental toggles default OFF (env ORS_CATALOG etc.). No existing query, response shape, or default behavior changes."}, {"version": "0.14.0-beta", "date": "2026-06-19", "type": "feature", "breaking": false, "summary": "New DSL verb SUBMIT with subject SAMPLE \u2014 an open client-information submission channel for overall learning. SUBMIT SAMPLE [ID \"<guid>\"] SOURCE \"<source>\" VALUE \"<value>\" records an arbitrary client-supplied payload (e.g. unmatched titles, free-text feedback, model training signal) into the new learning_samples table. Unlike the admin write commands (UPDATE/REMOVE/ADD), SUBMIT is intentionally OPEN to any authenticated dsl:execute caller \u2014 it is NOT gated to the admin role, since the goal is to let clients contribute samples. ID is optional: when omitted the server generates a v4 UUID and returns it; submissions are idempotent on the GUID (duplicate id -> recorded:false, no error). VALUE is parsed last so the payload may itself contain the words ID/SOURCE/VALUE.", "details": ["Grammar (src/dsl/ast_parser.py): new SUBMIT_COMMANDS={SUBMIT} / SUBMIT_SUBJECTS={SAMPLE}; _parse_submit -> _parse_submit_sample produces QueryAST(command=SUBMIT, subject=SAMPLE, args={id,source,value}). VALUE is split off first (last keyword) so arbitrary content is preserved; ID (optional) and SOURCE (required) are quoted keyword values in the head. New generic QueryAST.args field (src/dsl/ast_nodes.py).", "Execution (src/dsl/ast_executor.py): dispatch (SUBMIT, SAMPLE) -> _exec_submit_sample -> UnifiedDataHandler.add_sample(). NOT added to WRITE_COMMANDS, so no admin-role gate.", "Data layer (src/unified_data_handler.py): add_sample() validates SOURCE (<=255) and non-empty VALUE, validates/normalizes or generates the GUID, pre-checks for an existing id, then INSERT ... ON CONFLICT (id) DO NOTHING into learning_samples. Returns the canonical envelope with data.id (the GUID used), data.duplicate, data.recorded.", "Schema (sql/013_learning_samples_2026-06-19.sql): new PostgreSQL table learning_samples(id uuid PK, source varchar(255), value text, submitted_by varchar(255), created_at timestamptz default now()) + indexes on source and created_at.", "Provenance: handlers.py now sets ast._username for every command (not just writes) and hybrid_server_mcp.handle_dsl_query passes the authenticated user into the DSL handler, so submitted_by captures the JWT subject on the HTTP path.", "Surfaces updated: MCP _LEAN_DSL_DESCRIPTION KEY PATTERNS, web-test-ui-simplified.html reference cheat-sheets, docs/api-reference/API_REFERENCE.md."], "migration": "Apply sql/013_learning_samples_2026-06-19.sql (as the table owner) before deploying the code. Additive and backward-compatible: a new verb + new table only; no existing request/response shape, grammar, or schema changed. NOTE: SUBMIT is open to any dsl:execute caller by design \u2014 review whether your role/access model wants to restrict the SOURCE values clients may submit."}, {"version": "0.13.3-beta", "date": "2026-06-09", "type": "docs", "breaking": false, "summary": "Guidance-only update to the cold-call MCP context + structured HELP (no API/grammar/schema change). Two corrections proven against the live API: (1) steer callers from the legacy FULLTEXT-only CLUSTERCONVERT to TITLECONVERT \"x\" EXPAND CLUSTER for title->cluster resolution (the no-company TITLECONVERT path already returns the O*NET-set + matching clusters via the full fuzzy->vector->inference pipeline; CLUSTERCONVERT has no vector/inference fallback and is now marked legacy/being-phased-out); and (2) harden the 'ground first' agent behavior so models stop answering careers questions from memory. A cold-call A/B simulation (tools/coldcall_sim.py) against local Ollama showed the strengthened tool description steers capable models to call TITLECONVERT EXPAND CLUSTER as the first move.", "details": ["MCP cold-call surfaces (src/mcp/tool_definitions.py): _LEAN_DSL_DESCRIPTION gains a 'USE THIS FIRST \u2014 DO NOT ANSWER FROM MEMORY' directive (broad/strategy/coach prompts must be decomposed into tool calls, not answered from general knowledge) + a worked decomposition example; the title->companies workflow and KEY PATTERNS now use TITLECONVERT EXPAND CLUSTER instead of CLUSTERCONVERT; aoi_title_convert description documents EXPAND CLUSTER (title->O*NET->cluster) and the no-company candidate set, and adds a 'ground a role before advising' nudge.", "Structured HELP (docs/context/): convert.md repositions TITLECONVERT ... EXPAND CLUSTER as the primary title->cluster path and marks CLUSTERCONVERT legacy (FULLTEXT-only, no vector/inference); index.md, hiring.md, pathways.md, ai-impact-experimental.md updated to match. agent_behavior.md (served as HELP GUIDELINES and the MCP initialize 'instructions') upgrades 'Try before explaining' into a hard 'Ground first' rule with triggers, a 'broad questions need MORE tools' decomposition recipe, and an explicit anti-patterns block.", "API error suggestions (src/unified_data_handler.py): the three -32602 adjacency/baseline suggestions that pointed at CLUSTERCONVERT now point at TITLECONVERT ... EXPAND CLUSTER (string-only; no SQL/logic change).", "Deploy allowlist: added docs/context/ai-impact-experimental.md to scripts/deploy-with-checksums.sh DEPLOY_FILES (regression-15 guard \u2014 it was edited but absent from the promote allowlist).", "Planning: docs/planning/2026-06-08_CLUSTERCONVERT_DEPRECATION_AND_TITLECONVERT_PARAMS.md gains a 2026-06-09 addendum recording the re-grounded finding (convergence is a presentation change, not a new engine)."], "migration": "None. CLUSTERCONVERT remains fully functional (non-breaking); it is only de-advertised in guidance. No request/response shape, grammar, or schema change."}, {"version": "0.13.2-beta", "date": "2026-06-08", "type": "chore", "breaking": false, "summary": "Cosmetic re-skin of the Open WebUI chat (aonav.ai:3443) to the 'Where You Work Matters' (WYWM) brand. The instance name (WEBUI_NAME), model-preset display names, and the assistant system-prompt identity change from 'AOI Career Intelligence' to 'Where You Work Matters'; the chat favicon/splash become the WYWM wordmark and a small CSS overlay applies the WYWM palette (black canvas, #7aa3cc accent). NO API change and NO container rebuild \u2014 the stock open-webui image is unchanged; branding is layered on at the :3443 nginx proxy (favicon/splash aliases + a sub_filter that injects /wywm-assets/wywm-openwebui.css). Internal identifiers are intentionally NOT renamed: container names (open-webui-split, aoi-open-webui), volumes, WEBUI_SECRET_KEY value (aoi-openwebui-prod-secret), Docker DNS (aoi-mcp-split-pg), model IDs (aoi-career-*), the aoi-bedrock prefix_id, RDS DB names, IAM roles, and the SSH key all stay as-is to avoid breaking routing/sessions/deploys.", "details": ["Compose: WEBUI_NAME 'AOI Career Intelligence' -> 'Where You Work Matters' in docker-compose.split-api.yml (live split path) and the legacy docker-compose.unified.yml (docker-compose.yml is a symlink to it); local-dev default in openwebui/docker-compose.yml + openwebui/.env.example.", "Model presets: openwebui/setup-models.sh + setup-staging-preset.sh display names 'AOI Career Assistant*' -> 'WYWM Career Assistant*', system-prompt identity -> 'Where You Work Matters', preset tag aoi -> wywm. Functional model IDs (aoi-career-assistant, aoi-career-local/claude/premium/staging), base_model_id (aoi-bedrock-qwen), and MCP server info.id (aoi, aoi-staging) are unchanged. Presets live in the OWUI SQLite volume, so the scripts are re-run against the live instance after deploy.", "Visual: openwebui/branding/ bundle (wywm_logo.png + wywm-openwebui.css + README). nginx :3443 server block (openwebui/nginx/openwebui-proxy.conf) serves /wywm-assets/, aliases /favicon.ico + /static/{favicon*,splash*,apple-touch-icon,web-app-manifest-*}.png to the WYWM logo, and injects the CSS before </head> (Accept-Encoding cleared so sub_filter applies to the uncompressed HTML).", "Discovered (pre-existing, NOT changed here): the production OPENAI_API_CONFIGS uses named keys ('AOI Bedrock'/'Gemini'/...), but Open WebUI only matches that map by string index ('0','1',...) or URL \u2014 so the prefix_id values were already inert (the presets reference un-prefixed base model IDs, consistent with this). The 'AOI Bedrock' key was renamed to 'WYWM Bedrock' for cosmetic consistency only; converting the keys to indices was deliberately avoided because it would start applying prefix_id and break the preset base_model_id references.", "Open WebUI's license restricts removing its own product marks for large deployments without an enterprise license; this skin only renames the instance + swaps favicon/splash/colors and does not strip OWUI marks."], "migration": "None. No API/grammar/schema change. Operators: copy openwebui/branding/{wywm_logo.png,wywm-openwebui.css} to /var/www/wywm-assets/ on each host, apply the updated :3443 nginx block (nginx -t && reload), and re-run openwebui/setup-models.sh against the live instance so preset names refresh. Hard-refresh the browser (favicons cache)."}, {"version": "0.13.1-beta", "date": "2026-06-08", "type": "feature", "breaking": false, "summary": "TITLECONVERT gains an opt-in EXPAND CLUSTER option that attaches the occupation cluster (cluster_id + cluster_name) to each match, resolving title -> O*NET -> cluster in a single call. The cluster_name is read from occupation_info \u2014 the SAME source CLUSTERCONVERT uses \u2014 so the two conversion commands always return identical cluster names (occupation_cluster.cluster_name drifts from the canonical occupation_info.cluster_name for ~130/861 clusters, e.g. cluster 187 'Software Developers and Computer Programmers' vs canonical 'Software Engineers'). Additive and opt-in: cluster_id / cluster_name are omitted unless requested. DSL `TITLECONVERT \"x\" EXPAND CLUSTER`; REST `?expand=cluster`; MCP `aoi_title_convert` arg `expand_cluster:true`.", "details": ["DSL: EXPAND CLUSTER modifier on TITLECONVERT (RESOLVE CLUSTER DETAILS accepted as a synonym). No grammar change \u2014 reuses the existing EXPAND token path.", "REST: GET /api/titleconvert?expand=cluster (comma-composable; cluster|clusters|cluster_details accepted). query_info echoes the requested expand tokens.", "MCP: aoi_title_convert gains boolean expand_cluster.", "Data: cluster_id comes from occupation_cluster (the only onet->cluster bridge); cluster_name from occupation_info via LEFT JOIN, COALESCE fallback to occupation_cluster.cluster_name for the ~21 clusters with no occupation_info row. onet->cluster is 1:1 in the data (verified: 1016 mappings, 1016 distinct onets, 0 multi-cluster), so a single cluster is attached per match. Null-safe when no mapping exists.", "Isolated lookup (UnifiedDataHandler._resolve_clusters_for_onets) so a future occupation_cluster/occupation_info table merge becomes a single-table read with no join \u2014 one-spot change."], "migration": "None \u2014 additive opt-in. Cluster fields appear only when EXPAND CLUSTER / ?expand=cluster / expand_cluster is requested."}, {"version": "0.13.0-beta", "date": "2026-06-07", "type": "schema", "breaking": true, "summary": "6-2-26 BGI 'Companies Hiring by Occupation' update. `company_occupation_summary` retires the occupation-level `postings_count_qtile` (1-5 quintile) and gains five columns: `entry_posting_count` (INT), `entry_posting_tier` (high|above-average|average|low|none), `total_posting_count` (INT), `total_posting_tier`, and `early_career_access` (0/1, response-only). The old `POSTINGS_QTILE` filter and the cluster `postings_qtile` sort now return a structured -32602 deprecation error (NO value shim \u2014 the 1-5 quintile is not translated to a tier). New filters `ENTRY_POSTING_TIER` / `TOTAL_POSTING_TIER` (DSL) and `?entry_posting_tier` / `?total_posting_tier` (REST); new sorts `entry_postings` / `total_postings` (cluster\u2192companies) and `ORDER BY ENTRY_POSTING_COUNT|TOTAL_POSTING_COUNT` (LIST COMPANIES FOR OCCUPATION). The WYWM calculated sort now blends the archetype badge tier with the textual hiring tier and the hiring column is archetype-dependent: `early_career`\u2192`entry_posting_tier`, `growth`/`stability`\u2192`total_posting_tier`. The geographic `HIRING_POSTINGS_QTILE` (hiring_flag, 1-3) is a different column and is unchanged.", "details": ["Schema: ALTER company_occupation_summary ADD entry_posting_count, entry_posting_tier, total_posting_count, total_posting_tier, early_career_access; indexes idx_cos_entry_tier, idx_cos_total_tier (sql/009_cos_posting_tiers_2026-06-02.sql). DROP postings_count_qtile deferred to sql/010_* after the code swap is verified.", "Filters: POSTINGS_QTILE / ?postings_qtile -> -32602 deprecation error on LIST OCCUPATIONS FOR COMPANY, LIST COMPANIES FOR OCCUPATION, LIST COMPANIES FOR CLUSTER, and LIST/COUNT OCCUPATION_BADGES. Replacements: ENTRY_POSTING_TIER, TOTAL_POSTING_TIER (values high|above-average|average|low|none).", "Sorts: cluster sort modes are now alpha|entry_postings|total_postings|badge|industry|wywm (postings_qtile removed -> deprecation error). LIST COMPANIES FOR OCCUPATION supports ORDER BY ENTRY_POSTING_COUNT|TOTAL_POSTING_COUNT.", "WYWM: hiring tier values high>above-average>average>low>none map 1:1 onto the old 5>1 quintile order; early_career ranks on entry_posting_tier, growth/stability on total_posting_tier (WYWM_HIRING_COLUMNS). wywm_rank numbering (1..10, 99=unranked) unchanged.", "Responses: all company_occupation_summary-derived surfaces return the five new fields in place of postings_count_qtile, including the company_hiring block on LIST CLUSTERS EXPAND COMPANY_HIRING.", "Import: scripts/import_6-2-26_update.py performs an in-place UPSERT keyed on co_occ_uid; top10_occupation is PRESERVED from existing DB values (the CSV does not carry it). UID safety gate: co_occ_uid is computed from the DB company_uid; --apply aborts on CSV/DB company_uid mismatch unless --accept-uid-discrepancies is passed.", "early_career_access is response-only (no WHERE/ORDER BY)."], "migration": "No client data migration needed, but clients MUST stop sending POSTINGS_QTILE / ?postings_qtile (now an error) and read entry_posting_tier / total_posting_tier (filter) and entry_posting_count / total_posting_count (sort). Operators: run sql/009 + scripts/import_6-2-26_update.py before deploying this code; run sql/010 (DROP postings_count_qtile) only after the code swap is live and verified."}, {"version": "0.13.0-beta", "date": "2026-06-07", "type": "feature", "breaking": false, "summary": "Universal EXPAND TIMESTAMPS option exposes row created_at / updated_at (ISO-8601) in the payload. REST `?expand=timestamps`, DSL `EXPAND TIMESTAMPS`. Initial scope: LIST COMPANIES and LIST OCCUPATIONS FOR COMPANY (GET /api/companies/{name}/occupations?expand=timestamps). Additive and opt-in (timestamps omitted unless requested); reuses the existing expand convention so no grammar change.", "details": ["REST: ?expand=timestamps (comma-composable with other expand tokens).", "DSL: EXPAND TIMESTAMPS modifier.", "Timestamps are ISO-8601 strings; null-safe."], "migration": "None \u2014 additive opt-in."}, {"version": "0.12.6-beta", "date": "2026-06-06", "type": "fix", "breaking": false, "summary": "Repair + re-platform the TITLECONVERT Tier-2 vector-RAG disambiguation path, which had been silently dead (every ambiguous title fell through to the slow Tier-3 LLM after a ~3s embedding timeout \u2014 the user-visible 'queries take a long time / often don't come back'). Root causes: (1) embeddings came from the self-hosted GPU Ollama endpoint (`qwen3-embedding:8b`, 4096-dim) on the G5 spot instance which is normally STOPPED; (2) the `pgvector` columns were `vector(4096)`, above pgvector's 2000-dim HNSW limit, so the HNSW `CREATE INDEX` had always failed silently and the tables were both unindexed AND empty (0 rows); (3) `pgvector_client` RAG SQL used `:vec::vector`/`:meta::jsonb` casts, but SQLAlchemy `text()`'s bind-param parser skips `:name` when immediately followed by `::` (negative lookahead `(?!:)`), so `:vec` reached asyncpg literally \u2192 `syntax error at or near \":\"` on every query/upsert. Now: embeddings are provider-driven (default Bedrock Amazon Titan Text Embeddings V2, 1024-dim, normalized \u2014 no GPU; Cohere embed v3 is a 1024-dim drop-in future swap), the vector columns are `vector(1024)` with HNSW cosine indexes, and the SQL uses `CAST(:p AS \u2026)`. Ambiguous TITLECONVERT now runs Tier-2 RAG cleanly and resolves in ~0.5\u20132s; passive-learning upserts work again.", "details": ["NEW \u2014 `src/embedding_provider.py`: single provider-aware embedding helper (`embed_text` async / `embed_text_sync`). `EMBEDDING_PROVIDER=bedrock` (default) \u2192 Titan v2 (`amazon.titan-embed-text-v2:0`, body `{inputText, dimensions:1024, normalize:true}`); `cohere.*` \u2192 Cohere v3 (`{texts, input_type, embedding_types:[float]}`, 1024-dim); `EMBEDDING_PROVIDER=ollama` \u2192 legacy GPU HTTP path (kept for local dev). Region/model/dim via `BEDROCK_EMBEDDING_REGION|MODEL|DIM`.", "FIX \u2014 `src/pgvector_client.py`: `get_embedding()` now delegates to `embedding_provider` (with `input_type` search_query for queries / search_document for upserts); ALL RAG query + upsert SQL switched from `:vec::vector` / `:meta::jsonb` to `CAST(:vec AS vector)` / `CAST(:meta AS jsonb)` so SQLAlchemy binds the params (fixes `syntax error at or near \":\"`). `health_check` now reports the active embedding provider instead of hard-coding Ollama. Removed the now-unused `httpx` import.", "FIX \u2014 `scripts/seed_pgvector.py`: `Embedder` is provider-aware (shares `embedding_provider.embed_text_sync` when `EMBEDDING_PROVIDER=bedrock`, else the existing Ollama HTTP path); `--health` and startup logging report the provider/model.", "MIGRATION \u2014 `sql/010_embeddings_dim_1024.sql` (renumbered from 009 on 2026-06-08 to avoid collision with `009_cos_posting_tiers_2026-06-02.sql`): TRUNCATE (empty anyway) \u2192 DROP any embedding indexes \u2192 `ALTER COLUMN embedding TYPE vector(1024)` on `title_pattern_vectors`, `industry_affinity_vectors`, `onet_cluster_vectors` \u2192 recreate HNSW `vector_cosine_ops` indexes (now valid at 1024 dims). Idempotent; non-destructive (tables were empty). Applied to shared RDS `aoi_data_v6` as `aoi_admin` (table owner).", "SCHEMA \u2014 `sql/schema_postgres.sql`: canonical embedding dimension set to `vector(1024)` for all three vector tables, with a comment explaining the HNSW 2000-dim limit.", "CONFIG \u2014 `docker-compose.postgres-test.yml` (prod/staging): `EMBEDDING_PROVIDER=bedrock` + `BEDROCK_EMBEDDING_MODEL/REGION/DIM` defaults. `docker-compose.unified.yml` (local dev): defaults to `ollama` (Macs run `qwen3-embedding:8b`, no AWS creds). `.env.deploy.example`: documented the new vars."], "migration": "No API/data migration for clients \u2014 request/response shapes and DSL grammar are unchanged. Operationally: the shared RDS vector tables were re-typed 4096\u21921024 and re-seeded via Titan (run `EMBEDDING_PROVIDER=bedrock python3 scripts/seed_pgvector.py --all` from a host/container with the Bedrock IAM role after any future RDS restore or bulk title_conversion/companies change). The legacy GPU Ollama embedding path is no longer required for the RAG hot path. To revert, set `EMBEDDING_PROVIDER=ollama` and re-type columns back to vector(4096) (note: 4096 cannot be HNSW-indexed) \u2014 only meaningful with a live GPU.", "validated_on": "Migration mechanics rehearsed on local pgvector 0.8.2 (matches prod 0.8.0): 4096 HNSW fails with 'column cannot have more than 2000 dimensions for hnsw index'; ALTER\u21921024 + HNSW + cosine query succeed. Staging+prod (shared RDS aoi_data_v6, pgvector 0.8.0): migration applied (3 tables \u2192 vector(1024), 3 HNSW indexes created), Titan verified via EC2 IAM role `aoi-ec2-bedrock` (dim=1024), seeded 1016 title-pattern + 32 industry-affinity vectors (0 errors, 122s). Direct Titan+HNSW search: registered nurse\u219229-1141.00 (10ms), software engineer\u219215-1252.00 (3ms), truck driver\u219253-3033.00 (4ms). End-to-end TITLECONVERT on both nodes: no syntax/embedding errors in logs, disambiguation runs Tier-2 RAG, ambiguous titles resolve in ~0.5\u20132s (vs prior 3\u20136s), passive-learning upserts succeed (table grew 1016\u21921024). Both /health = 200.", "_pending_validation": false}, {"version": "0.12.5-beta", "date": "2026-06-05", "type": "feature", "breaking": false, "summary": "New calculated sort mode `WYWM` on the cluster\u2192companies path (REST `GET /api/clusters/{id}/companies?sort=wywm`, DSL `LIST COMPANIES FOR CLUSTER \"<id>\" ORDER BY WYWM`). It ranks each (company \u00d7 occupation-cluster) row by a fixed 10-bucket blend of a per-pair archetype badge tier (Platinum > Gold) and the occupation hiring quintile `cos.postings_count_qtile` (5=High \u2026 1=No hiring). Both ranking signals live on `company_occupation_summary`, so there is NO join for the rank itself (the path still LEFT JOINs `companies` for `primary_industry`, unchanged from 0.12.4-beta). Priority order (rank 1=best): Platinum+5, Platinum+4, Gold+5, Gold+4, Platinum+3, Platinum+2, Gold+3, Gold+2, Platinum+1, Gold+1. Rows whose chosen badge column is NA/null OR whose quintile is missing get rank 99 and sort last, then alphabetically by company_name. Which archetype supplies the Platinum/Gold tier is parameterized: DSL token suffix (`WYWM_EARLY_CAREER` | `WYWM_GROWTH` | `WYWM_STABILITY`) or REST `?archetype=early_career|growth|stability` (also accepted as `?sort=wywm_growth`); bare `WYWM` / `?sort=wywm` defaults to `early_career`; unknown archetypes silently coerce to the default. Each result row carries `wywm_rank` (1..10, or 99=unranked) and the response envelope echoes `wywm_archetype`. The order is deliberately non-linear (no single weighted sum reproduces it: Platinum@4 outranks Gold@5, yet Gold@4 outranks Platinum@3), so it is implemented as an explicit ordered priority list rather than a formula.", "details": ["UDH \u2014 `src/unified_data_handler.py`: new `WYWM_PRIORITY` class constant is the single source of truth for the ranking \u2014 list position == rank number, so re-tuning is a one-line reorder/insert with nothing to renumber (Option A). New `WYWM_ARCHETYPE_COLUMNS` map + `WYWM_DEFAULT_ARCHETYPE='early_career'` + `WYWM_RANK_UNRANKED=99`. New `_build_wywm_sort_sql(archetype, alias)` generates the `CASE` over `<alias>.badge_<archetype>` + `<alias>.postings_count_qtile` from `WYWM_PRIORITY`. `list_companies_for_cluster()` adds `wywm` to `CLUSTER_COMPANIES_SORT_MODES`, adds the four `wywm*` sort aliases, resolves the archetype (token suffix \u2192 `archetype` option \u2192 default), aliases the rank as `wywm_rank` in the `SELECT DISTINCT` (required for PG DISTINCT + ORDER BY), adds the `ORDER BY wywm_rank ASC, cos.company_name ASC` branch, surfaces `wywm_rank` per row and `wywm_archetype` in the envelope.", "REST \u2014 `src/hybrid_server_mcp.py`: `/api/clusters/{id}/companies` now reads `?archetype=` into `dsl_options` (alongside the existing `?sort=`/`?order_by=` passthrough) and the inline `# Sort:` comment lists `wywm`. The UDH whitelist still gates invalid sorts \u2192 HTTP 400 with `valid_sort_modes` (now including `wywm`).", "DSL \u2014 `src/dsl/ast_executor.py` + parser: no change. `WYWM` and `WYWM_GROWTH` etc. parse as ordinary single-token ORDER BY fields and flow through `dsl_options['order_by']`; the UDH `sort_aliases` map + whitelist do the rest.", "SCHEMA REGISTRY \u2014 `src/admin/table_schema_registry.py`: removed the unused `badge_overall` writable column from the `company_occupation_summary` entry. The per-pair table carries archetype badges only; company-wide/rollup badges live on `companies`. The column was never populated or surfaced (only this registry reference existed), so there is no data or response-shape impact.", "MCP \u2014 `src/mcp/tools_registry_unified.py` + `src/mcp/tool_definitions.py`: cluster ORDER BY grammar line + example set + inline help extended with WYWM (and the `_<archetype>` variants).", "DOCS \u2014 `docs/api-reference/API_REFERENCE.md` (sort param row + new `archetype` param + WYWM priority table + `sort=wywm` response example + DSL grammar row + `valid_sort_modes` error block), `docs/context/clusters.md` (examples, REST equivalent, response fields, ORDER BY whitelist).", "UI \u2014 `web-test-ui-simplified.html`, `web-ui-examples-dsl.json`, `web-ui-examples-rest.json`: added `ORDER BY WYWM` / `?sort=wywm` (+ archetype) example queries.", "TESTS \u2014 `tests/test_suite/test_clusters.py`: new cases asserting `?sort=wywm` / `ORDER BY WYWM` returns rows in non-decreasing `wywm_rank` order with the documented bucket mapping, that count/total are unchanged vs the unsorted baseline (no fan-out / no drop), that `archetype`/token-suffix selection changes the ranking column, that unranked rows (NA/null badge or missing quintile) land at rank 99 / last, and that the envelope echoes `wywm_archetype`."], "migration": "Backward-compatible additive change. New sort mode `wywm` (REST `?sort=wywm` [+ `?archetype=\u2026`], DSL `ORDER BY WYWM[_<archetype>]`) on `GET /api/clusters/{id}/companies` / `LIST COMPANIES FOR CLUSTER`. New optional per-row field `wywm_rank` and envelope field `wywm_archetype` appear ONLY when `sort=wywm`. No existing sort/filter behavior changed, no fields removed. The `badge_overall` schema-registry removal has no client impact (column was never populated or returned). Clients that strictly validate response schemas should allow the new optional fields under the WYWM sort.", "validated_on": "Local PostgreSQL stack (aoi-mcp-split-pg \u2192 postgres-split, 1752 companies) on 2026-06-05 \u2014 rebuilt container reports 0.12.5-beta; smoke 25/25, functional 145/146 (1 pre-existing skip), data 26/26 (incl. 3 new WYWM cases: rank-matches-priority, count-invariant, invalid-sort-lists-wywm). Live spot-check on cluster 187 (total 1084): `?sort=wywm` returns Platinum+qtile5 rows at rank 1, `wywm_archetype=early_career` echoed; DSL `ORDER BY WYWM_GROWTH` resolves `archetype=growth` and ranks on `badge_growth`; unranked rows (NA/null badge or missing quintile) land at rank 99 and sort last. Staging (staging.aonav.ai, T3 \u2192 RDS PostgreSQL) verified 2026-06-05 via manual code-only deploy (scp + container rebuild): health reports 0.12.5-beta; smoke 25/25, functional 145/146 (1 pre-existing skip), data 26/26; live `?sort=wywm` returns rank-1 Platinum@5 and honors `?archetype=stability`. Production promote deferred pending sign-off.", "_pending_validation": false}, {"version": "0.12.4-beta", "date": "2026-06-03", "type": "feature", "breaking": false, "summary": "Surface the company's `primary_industry` on the two cluster-scoped company surfaces, and add an `INDUSTRY` server-side sort mode to the multi-company cluster path. (1) REST `GET /api/companies/{name}/clusters` (DSL `LIST COMPANY CLUSTERS`) now returns `primary_industry` on each row. Because this is a single-company surface, the industry is a company-wide constant \u2014 it is resolved ONCE from the company context (the existing `_resolve_company_context` SELECT against `companies` gains the column) rather than per-row joined, and is also echoed in the response `command` block. (2) REST `GET /api/clusters/{id}/companies` (DSL `LIST COMPANIES FOR CLUSTER`) now returns `primary_industry` on each row via a `LEFT JOIN companies c ON c.company_name = cos.company_name`, and accepts a new sort mode `industry` / DSL `ORDER BY INDUSTRY` (sorts `primary_industry ASC NULLS LAST`, `company_name` tie-break). The join is deliberately LEFT (not INNER) and keyed on `company_name` (the PK of `companies`, 1:1): INNER would silently drop `company_occupation_summary` rows whose `company_name` is absent from `companies` (a documented orphan hazard) and corrupt the `total` count, while joining on the non-unique `company_uid` would fan out rows and break `SELECT DISTINCT`. With the 1:1 `company_name` join, DISTINCT semantics and the wrapped count subquery are unchanged.", "details": ["UDH \u2014 `src/unified_data_handler.py`: `_resolve_company_context()` SELECT adds `c.primary_industry` and returns it in the context dict (None on the `company_occupation_summary` fallback path, which has no industry column). `list_company_clusters()` attaches the resolved `primary_industry` to every row and to the `command` envelope. `list_companies_for_cluster()` adds `LEFT JOIN companies c ON c.company_name = cos.company_name`, selects `c.primary_industry`, surfaces it in each result entry, adds `industry` to `CLUSTER_COMPANIES_SORT_MODES` + the `sort_aliases` map (`industry`, `primary_industry`), and adds an `ORDER BY c.primary_industry ASC NULLS LAST, cos.company_name ASC` branch.", "REST \u2014 `src/hybrid_server_mcp.py`: no code change needed for `/api/clusters/{id}/companies` (it forwards `?sort=` straight through to the UDH and returns the UDH envelope verbatim, so the new field + sort mode are picked up automatically); only the inline `# Sort:` comment was updated to list `industry`. `/api/companies/{name}/clusters` returns the new field automatically via the UDH `data` passthrough.", "DSL \u2014 `src/dsl/ast_executor.py`: no grammar change. `ORDER BY INDUSTRY` already parses as a generic ORDER BY field; the new sort mode is gated by the UDH whitelist (`sort_aliases` + `CLUSTER_COMPANIES_SORT_MODES`). The existing `INDUSTRY` WHERE filter on this path (subquery against `companies.primary_industry`) is unchanged.", "MCP \u2014 `src/mcp/tools_registry_unified.py` + `src/mcp/tool_definitions.py`: ORDER BY grammar line and example set extended with `INDUSTRY`.", "DOCS \u2014 `docs/api-reference/API_REFERENCE.md`, `docs/api-reference/REST_API_REFERENCE_EXTERNAL.md`, `api-reference.txt`, `docs/context/clusters.md`: response examples gain `primary_industry`; `sort` param + ORDER BY whitelist + invalid-sort `valid_sort_modes` updated to include `industry`; single-company vs per-row provenance of `primary_industry` documented.", "UI \u2014 `web-test-ui-simplified.html`, `web-ui-examples-dsl.json`, `web-ui-examples-rest.json`: added `ORDER BY INDUSTRY` / `?sort=industry` example queries.", "TESTS \u2014 `tests/test_suite/test_clusters.py`: new cases asserting `primary_industry` is present on both surfaces, that `ORDER BY INDUSTRY` / `?sort=industry` returns rows ordered by industry with NULLs last, that the row count/total is unchanged by the LEFT JOIN vs the unsorted baseline (proves no fan-out / no drop), and that an invalid sort still 400s with `industry` now in `valid_sort_modes`."], "migration": "Backward-compatible additive change. New response field `primary_industry` (may be `null`) on `GET /api/companies/{name}/clusters`, `GET /api/clusters/{id}/companies`, and their DSL equivalents. New sort mode `industry` (REST `?sort=industry`, DSL `ORDER BY INDUSTRY`) on the cluster\u2192companies path. No fields removed, no existing sort/filter behavior changed. Clients that strictly validate response schemas should allow the new optional field.", "validated_on": "Local PostgreSQL stack (aoi-mcp-split-pg \u2192 RDS aoi_data_v6, 1752 companies) on 2026-06-03 \u2014 smoke 25/25, functional 141/142 (1 pre-existing skip), data 23/23. Verified live: REST `GET /api/companies/Microsoft/clusters?cluster_id=187` and DSL `LIST COMPANIES FOR CLUSTER \"187\" ORDER BY BADGE WITH SCORE LIMIT 15` both return `primary_industry`; DSL `total` remains 1084 (unchanged by the LEFT JOIN \u2014 no fan-out, no drop); `ORDER BY INDUSTRY` sorts A\u2192Z with company_name tie-break; invalid sort 400s with `industry` in `valid_sort_modes`. Production verification deferred to the next staging/promote cycle."}, {"version": "0.12.3-beta", "date": "2026-06-01", "type": "fix", "breaking": false, "summary": "Three legacy `FOR ...` UDH methods (`list_companies_for_cluster`, `list_companies_for_occupation`, `list_company_occupations`) now honor the `Ranked` badge sentinel that was previously only recognized by the top-level `list_companies` path. The bug had two visible symptoms depending on the API surface: DSL `LIST COMPANIES FOR CLUSTER \"187\" WHERE BADGE_GROWTH IS \"Ranked\"` silently returned the unfiltered baseline (1084 rows instead of 725) because the AST executor rewrites `BADGE_X IS \"Ranked\"` to `filters[\"badge_X_ranked\"] = True`, but only `list_companies` recognized that suffix-keyed sentinel; REST `?badge_growth=Ranked` against the same paths was worse \u2014 it passed the literal string `Ranked` straight through to a `WHERE cos.badge_growth = 'Ranked'` clause, which matches 0 rows (the column values are `Platinum`/`Gold`/`NA`/null, never the literal sentinel). Same root cause: the three legacy methods only checked `filters.get(\"badge_X\")` with a literal equality, missing both the DSL suffix shape and the REST string-sentinel shape. New `UnifiedDataHandler._per_pair_badge_filters` static helper accepts both shapes and is the single source of truth for badge filtering across all three methods; mirrors the `_badge_value_clause` helper that already powered the 0.12.0-beta badge-domain subjects (which is why those new endpoints were never affected \u2014 they were built with the helper from day one).", "details": ["UDH \u2014 `src/unified_data_handler.py`: new `_per_pair_badge_filters(filters, params, column_prefix)` static method handles all three badge fields (early_career / growth / stability) and both input shapes (`filters[\"badge_X_ranked\"]=True` from DSL, `filters[\"badge_X\"]=\"Ranked\"` from REST, plus literal `filters[\"badge_X\"]=\"Platinum\"`). Replaces ~20 lines of duplicated per-field `if filters.get(\"badge_X\"):` blocks in each of `list_companies_for_cluster`, `list_companies_for_occupation`, and `list_company_occupations` (the third method passes a dynamic `column_prefix` based on whether the onet_code JOIN is active). No grammar, schema, or new endpoints; pure UDH-layer fix.", "TESTS \u2014 `tests/test_suite/test_badges.py`: 5 new `ranked_sentinel:*` cases in the `data` mode. (1) `dsl_for_cluster_filters_subset_of_baseline` \u2014 asserts DSL Ranked filter on FOR CLUSTER returns strictly fewer rows than unfiltered baseline. (2) `rest_for_cluster_matches_dsl` \u2014 asserts REST and DSL Ranked filters return the same total (proves both shapes now route through the same helper). (3) `dsl_for_occupation_filters_subset` \u2014 same for `LIST COMPANIES FOR OCCUPATION`. (4) `dsl_for_company_filters_subset` \u2014 same for `LIST OCCUPATIONS FOR COMPANY`. (5) `cross_path_parity_with_occupation_badges` \u2014 tightest possible check: `LIST COMPANIES FOR CLUSTER X WHERE BADGE_GROWTH IS \"Ranked\"` must equal `LIST OCCUPATION_BADGES WHERE CLUSTER_ID IS X AND BADGE_GROWTH IS \"Ranked\"` (both surface the same underlying (company \u00d7 occupation) rows for that cluster, so the legacy and new code paths must converge). Also tweaked `_total_or_count` helper to handle three distinct response envelopes (DSL double-wrap, REST-badge mirror, REST-legacy flat).", "Live impact pre-fix on production: `LIST COMPANIES FOR CLUSTER \"187\" WHERE BADGE_GROWTH IS \"Ranked\"` returned 1084 rows (unfiltered baseline) \u2014 correct answer is 725. `GET /api/clusters/187/companies?badge_growth=Ranked` returned 0 rows (literal-string match). Both now return 725, matching the 0.12.0-beta-shipped `LIST OCCUPATION_BADGES WHERE CLUSTER_ID IS \"187\" AND BADGE_GROWTH IS \"Ranked\"` which always returned 725 (built with the helper from day one).", "Provenance: discovered 2026-06-01 during a post-0.12.1-beta-promote API audit. The audit followed up on a question about why \"Ranked\" doesn't show up for cluster queries; ground-truthing across all DSL FOR-paths and the REST mirrors confirmed three silently-broken paths."], "migration": "Backward-compatible additive fix. Clients that were inadvertently relying on the broken behavior (DSL returning the unfiltered baseline when Ranked was specified, or REST returning empty arrays) will now see correctly filtered subsets. Neither prior behavior was a documented contract \u2014 both were always defects. The literal tier filters (`Platinum`, `Gold`) on these same paths were unaffected (they always worked) and remain unchanged. No client SDK or response-shape changes.", "validated_on": "Local PostgreSQL stack (mac-tech, RDS dump 2026-05-27) \u2014 229/230 tests pass (the 1 skip is pre-existing). The 5 new `ranked_sentinel:*` cases pass against local. Production verification deferred to the next staging/promote cycle, which will be the bundle of 0.12.2-beta (`education_level` retirement) + 0.12.3-beta (this fix) + the `fix(deploy)` allowlist patch from 2026-06-01."}, {"version": "0.12.2-beta", "date": "2026-05-31", "type": "breaking", "breaking": true, "summary": "Clean retirement of the derived `education_level` response field from `/api/occupations/{onet}/education` and `/api/occupations/cluster:{id}/education` (also reachable via DSL `GET EDUCATION FOR OCCUPATION`). No deprecation window \u2014 the field is removed in this release. `education_level` was a 3-bucket threshold of `ba_plus_share` (still in the response) and carried an inconsistent vocabulary across the two endpoints (`get_occupation_education` returned descriptive strings like `Bachelor's degree or higher typically required`; `get_cluster_education` returned shorthand like `High (Bachelor's+)` / `Moderate` / `Lower`). Both were inconsistent with `job_level` (BGI's authoritative credential classification) and added no information beyond what `ba_plus_share` + `job_level` already provide together. Clients that previously read `education_level` should read `ba_plus_share` (raw 0.0\u20131.0 share of workforce with a BA) and/or `job_level` (`Sub-Bachelor's Degree`, `Bachelor's Degree`, `Advanced Degree`, with optional `+ Experience` suffix).", "details": ["UDH \u2014 `src/unified_data_handler.py`: `get_occupation_education(onet_code)` drops the derivation block and removes `education_level` from the returned `data` dict; `get_cluster_education(cluster_id)` does the same per-row and updates its docstring example. `ba_plus_share` and `job_level` are unchanged on both.", "API REFERENCE \u2014 `docs/api-reference/API_REFERENCE.md`: cluster Education response example updated; explicit removal callout added pointing clients to `ba_plus_share` + `job_level`.", "TESTS \u2014 `tests/test_education_level_retirement.py` (new): asserts `education_level` is absent from both `get_occupation_education` and `get_cluster_education` responses, asserts `ba_plus_share` and `job_level` remain intact, and asserts both endpoints handle a NULL `ba_plus_share` cleanly.", "Provenance: originally drafted as 0.11.8-beta against 0.11.7; re-versioned to 0.12.2-beta and folded onto the 0.12.x line because the 0.12.0/0.12.1 badge-domain work shipped to staging first. The two changes are independent (no shared files beyond version/changelog metadata)."], "migration": "Clients calling `/api/occupations/{onet}/education` or `/api/occupations/cluster:{id}/education` (REST), or `GET EDUCATION FOR OCCUPATION \"<onet>\"` (DSL), must drop any read of `education_level`. To recover the prior heuristic: 3-bucket from `ba_plus_share` (\u22650.7 \u2192 high, \u22650.4 \u2192 moderate, else lower). For an authoritative credential classification, prefer `job_level`. No data migration required.", "validated_on": "Offline unit tests in `tests/test_education_level_retirement.py` (DummyDB pattern) plus local + staging suite re-run after the fold. Live verification against production pending promotion."}, {"version": "0.12.1-beta", "date": "2026-05-28", "type": "feature", "breaking": false, "summary": "REST mirror of the 0.12.0-beta badge-domain DSL subjects. Four new endpoints expose the same analytic surface to clients that prefer query-string filters over the DSL grammar: `GET /api/companies/badges` \u2194 `LIST COMPANY_BADGES`, `GET /api/companies/badges/counts` \u2194 `COUNT COMPANY_BADGES [GROUP BY ...]`, `GET /api/occupations/badges` \u2194 `LIST OCCUPATION_BADGES`, and `GET /api/occupations/badges/counts` \u2194 `COUNT OCCUPATION_BADGES [GROUP BY ...]`. The counts endpoints honor `group_by=archetype` and return the 3\u00d73 archetype \u00d7 tier matrix in a single call (parity-tested against the DSL surface in `tests/test_suite/test_badges.py`). Multi-value query params (e.g. `?badge_growth=Platinum,Gold`) trigger OR-match, and the literal value `Ranked` on any badge field matches any non-null / non-NA tier (same `_badge_value_clause` UDH helper as DSL). Unknown `group_by` dimensions return HTTP 400 with the structured `-32602` error body (`valid_group_by`, `suggestion`, `example`) preserved under `data.error` so MCP clients can self-correct.", "details": ["REST \u2014 `src/hybrid_server_mcp.py`: four new handlers (`list_company_badges_rest`, `count_company_badges_rest`, `list_occupation_badges_rest`, `count_occupation_badges_rest`) + two static helpers (`_collect_badge_filters` for query-string parsing with comma-split OR semantics, `_badges_response` for the success/error envelope). Total ~250 lines, zero new business logic \u2014 every handler delegates to the existing UDH method shipped in 0.12.0-beta. Route registration is interleaved with the existing `/api/companies/...` and `/api/occupations/...` blocks; both literal-path routes (`.../badges` and `.../badges/counts`) are registered BEFORE the parameterized `/{company_name}` and `/{onet_code}` routes so aiohttp's first-match-wins matcher routes them correctly (regression-guarded by `corner_rest_badges_path_priority`).", "AUTH \u2014 `src/auth/middleware_strict.py`: no changes needed. The existing wildcard rules `GET:/api/companies/*` (companies:read) and `GET:/api/occupations/*` (occupations:read, wages:read) already cover the four new paths.", "TESTS \u2014 `tests/test_suite/test_badges.py`: 14 new cases (4 smoke, 6 functional, 3 data, 3 corner). Smoke verifies all four endpoints return their expected row/matrix shape. Functional covers single-value filters, multi-value comma-split, company-scoped filtering, scalar count, GROUP BY industry, GROUP BY company. Data cross-checks the REST 3\u00d73 matrix against the DSL 3\u00d73 matrix bit-for-bit (proves the REST handler is a thin pass-through), and verifies `total` field parity for filtered LISTs. Corner covers (a) the route-priority regression where `/api/companies/badges` could be swallowed by `/api/companies/{company_name=badges}`, (b) unknown `group_by` returning HTTP 400 with `-32602` body preserved under `data.error`, (c) non-integer `cluster_id` returning HTTP 400 with a clear message before reaching UDH.", "DOCS \u2014 `docs/api-reference/API_REFERENCE.md`: 2 new rows in REST Endpoint Summary (one for each `/api/{companies,occupations}/badges[/counts]` pair); 4 new endpoint sections (~250 lines) under Company Endpoints and Occupation Endpoints, each with filter table, grouping table, 4-5 example queries, success-response JSON for both row-list and 3\u00d73-matrix shapes, and error-response JSON for the -32602 case. The pre-existing single-company endpoint `GET /api/companies/{name}/badges` got a one-line note pointing callers to the new cross-company `/api/companies/badges` form.", "WEB UI \u2014 `web-ui-examples-rest.json`: new `BADGE ANALYTICS (0.12.1-beta)` category with 8 representative cURL examples covering the four endpoints plus the multi-value and archetype-matrix patterns."], "migration": "Purely additive \u2014 no existing endpoints, query params, or response shapes changed. The single-company endpoint `GET /api/companies/{name}/badges` is unchanged; the new `/api/companies/badges` (plural, no `{name}` segment) is a sibling, not a replacement. Route registration order is the only sharp edge: any future addition under `/api/companies/*` or `/api/occupations/*` with a literal segment must be registered BEFORE the corresponding `{company_name}` / `{onet_code}` parameterized route, or aiohttp will swallow it (test `corner_rest_badges_path_priority` will fail loudly if this regresses).", "validated_on": "Local PostgreSQL (mac-tech workstation, RDS dump 2026-05-27): all 14 new REST cases pass + all 22 existing DSL badge cases still pass. Full test suite: smoke 25/25, functional 135/135 (+1 skipped), data 16/16, corner 33/33. Manual cURL spot-checks confirmed (a) `/api/companies/badges/counts?group_by=archetype` returns the same 1,752-total / 352-platinum-growth matrix as `COUNT COMPANY_BADGES GROUP BY ARCHETYPE`, (b) `/api/occupations/badges/counts?group_by=archetype` returns the same 54,770-total matrix as the DSL form, (c) bad `group_by` returns HTTP 400 with the full structured error body."}, {"version": "0.12.0-beta", "date": "2026-05-22", "type": "feature", "breaking": false, "summary": "Two new badge-domain DSL subjects expose the WYWM archetype badges as first-class LIST + COUNT targets: `COMPANY_BADGES` (per-company rollup over `companies`) and `OCCUPATION_BADGES` (per (company \u00d7 occupation-cluster) pair over `company_occupation_summary`). Closes the documented loop-pattern gap where per-pair badges were only reachable by iterating `LIST OCCUPATIONS FOR COMPANY` over all ~1,750 assessed companies. `COUNT` honors `GROUP BY`: the new `GROUP BY ARCHETYPE` dimension returns the 3\u00d73 archetype \u00d7 tier matrix (early_career / growth / stability \u00d7 platinum / gold / ranked) in a single response; `GROUP BY INDUSTRY | OVERALL_BADGE | COMPANY_STATE | COMPANY | CLUSTER_ID` return per-group `[{group_value, count}, ...]` rows. Confusable-field hints fire when callers mis-use `WHERE TOTALS`, `WHERE SUMMARY`, `WHERE ARCHETYPE`, or `WHERE OVERALL_BADGE` (on the pair subject) \u2014 each error points at the right modifier or sibling subject.", "details": ["UDH \u2014 `src/unified_data_handler.py`: four new methods (`list_company_badges`, `count_company_badges`, `list_occupation_badges`, `count_occupation_badges`) plus helpers (`_archetype_matrix_select` for the 3\u00d73 SUM(CASE WHEN \u2026) fragment, `_archetype_matrix_from_row` for the nested response reshaping, `_badge_value_clause` for shared 'Ranked' / IN-list handling, module-level `_bad_group_by_error` for the structured GROUP BY error). `OCCUPATION_BADGES` filters by `INDUSTRY` lazily LEFT JOIN `companies`, and by `ONET_CODE` lazily LEFT JOIN `occupation_cluster` (since `company_occupation_summary` is cluster-level and carries neither column directly). Same column semantics as `LIST OCCUPATIONS FOR COMPANY` rows \u2014 the `data` cross-checks in `tests/test_suite/test_badges.py` confirm matrix totals match single-COUNT-per-archetype calls.", "DSL \u2014 `src/dsl/ast_executor.py`: `COMPANY_BADGES_FIELD_MAP` and `OCCUPATION_BADGES_FIELD_MAP` registered. Four new (LIST/COUNT \u00d7 COMPANY_BADGES/OCCUPATION_BADGES) entries added to `FIELD_MAPS`, `SUBJECT_LABELS`, dispatch table, and `CONFUSABLE_FIELD_HINTS`. New `_resolve_group_by` helper. Four new executor methods (`_exec_list_company_badges`, `_exec_count_company_badges`, `_exec_list_occupation_badges`, `_exec_count_occupation_badges`) \u2014 COUNT executors thread `ast.modifiers.group_by` (uppercased) through to the UDH.", "DSL PARSER \u2014 `src/dsl/ast_parser.py`: `_extract_where` now treats `GROUP BY` as a WHERE-clause boundary keyword (alongside `ORDER BY`, `LIMIT`, etc.) so `COUNT \u2026 WHERE \u2026 GROUP BY \u2026` correctly splits the WHERE from the modifier. Pre-0.12.0 the GROUP BY token was swallowed into the WHERE text and became a silently-dropped filter for any new use; the existing `TITLECONVERT \u2026 GROUP BY ONET` path still works (its parser owns the modifier extraction at the CONVERT level, not via the generic WHERE boundary).", "MCP DESCRIPTION \u2014 `src/mcp/tool_definitions.py`: `_LEAN_DSL_DESCRIPTION` KEY PATTERNS gains `LIST OCCUPATION_BADGES WHERE BADGE_GROWTH IS \"Platinum\" LIMIT 100`, `COUNT OCCUPATION_BADGES GROUP BY ARCHETYPE`, and `COUNT COMPANY_BADGES WHERE BADGE_GROWTH IS \"Platinum\" GROUP BY INDUSTRY` so every MCP client (Bedrock bridge, Claude Desktop, Cursor, Open WebUI) surfaces the new grammar on next `tools/list`. New `badges` category in `_DSL_EXAMPLES_BY_CATEGORY` with 11 representative queries.", "DOCS \u2014 `docs/api-reference/API_REFERENCE.md` DSL Command Quick Reference table gains 5 new rows; new full section `### Badge Subjects (0.12.0-beta)` with subject-by-subject filter tables, group-by tables, example queries, response-shape JSON for `GROUP BY ARCHETYPE` vs other dimensions, and the confusable-field-hint table. `docs/context/occupations.md` 'Common pitfall' section rewritten: the old 3-step loop-over-1,750-companies workflow is replaced with a single `LIST OCCUPATION_BADGES WHERE \u2026` call; the loop pattern is moved to a 'Legacy workflow (pre-0.12.0-beta)' coda for older clients.", "WEB UI \u2014 `web-ui-examples-dsl.json` (consumed by `web-test-ui-simplified.html`): new `BADGE SUBJECTS (0.12.0-beta)` category with 12 examples covering LIST + COUNT for both subjects, archetype matrix, and per-dimension grouping.", "TESTS \u2014 `tests/test_suite/test_badges.py` (new, 22 cases): smoke (LIST + COUNT-archetype for both subjects), functional (rows + filters + scalar/grouped counts), data (3 consistency tests cross-checking the archetype matrix against single-COUNT calls and LIST `total` against COUNT scalar), corner (TOTALS / OVERALL_BADGE / unknown-field / unknown-GROUP-BY hint coverage). Registered in `tests/test_suite/runner.py`."], "migration": "No client migration required. Pre-existing endpoints, queries, and response shapes are unchanged. The four new subject paths are purely additive. One small parser-level change: `WHERE \u2026 GROUP BY X` on subjects that previously didn't recognize `GROUP BY` (everything except `TITLECONVERT`) used to silently absorb the modifier into the WHERE text and drop it; that pattern was always non-functional in practice and the new path now returns a structured `-32602` if `X` isn't a valid GROUP BY dimension. Any agent code that was emitting `LIST COMPANIES WHERE \u2026 GROUP BY \u2026` and reading the LIMIT-capped unfiltered result will now get a constructive error instead.", "validated_on": "Local PostgreSQL \u2014 parser unit tests + executor stub-UDH end-to-end coverage of LIST/COUNT/GROUP BY/error paths. Pre-deploy: `python -m tests.test_suite.runner --mode smoke|functional|data` covering all 22 new badge cases plus the regression suite. Staging then production via `./scripts/deploy-staged.sh` + `--promote`. Post-deploy verification: (a) `curl https://aonav.ai/mcp/v1` returns the updated `_LEAN_DSL_DESCRIPTION`; (b) `LIST OCCUPATION_BADGES WHERE BADGE_GROWTH IS \"Platinum\" LIMIT 10` returns rows directly (no loop); (c) `COUNT OCCUPATION_BADGES GROUP BY ARCHETYPE` returns the 3\u00d73 matrix and `total_rows` matches `COUNT OCCUPATION_BADGES`."}, {"version": "0.11.7-beta", "date": "2026-05-07", "type": "feature", "breaking": false, "summary": "Expose the Gen-3 AI-impact narrative + trend surface on REST and DSL. The narrative fields (`low_expertise_narrative`, `high_expertise_narrative`) and trend fields (`entry_barrier_trend`, `expertise_premium_trend`) from the `ai_expertise_impact` table \u2014 added in 0.11.4-beta but until now reachable only by direct SQL \u2014 are now first-class on the public API. New dedicated cluster endpoint `GET /api/clusters/{cluster_id}/ai-impact`. New `expand=ai_impact` token on the two top-level list endpoints. New REST trend filters and sort. New DSL grammar `EXPAND AI_IMPACT`, `WHERE ENTRY_BARRIER_TREND IS \"...\"`, `WHERE EXPERTISE_PREMIUM_TREND IS \"...\"`, and `ORDER BY ENTRY_BARRIER_TREND|EXPERTISE_PREMIUM_TREND|AI_FLAG`. Coverage block on every `ai_impact` response distinguishes classified clusters (177 of ~882 at methodology 2026-04-21) from unclassified ones \u2014 unclassified clusters return `success=true` with null narratives and `coverage.classified=false`, not 404. NULLS LAST is unconditionally applied to all new sort orders so unclassified rows don't crowd the top of any sorted page.", "details": ["UDH FOUNDATION \u2014 `src/unified_data_handler.py`: `list_clusters` and `list_occupations` now unconditionally `LEFT JOIN ai_expertise_impact` (so trend filters and sorts work even without `expand`); narrative columns are pulled only when `expand=ai_impact` is set. New helpers: `build_ai_impact_block(row, expanded)` extracts the ai_impact sub-block with its coverage metadata; `resolve_order_by(...)` validates against `LIST_CLUSTERS_SORTABLE` / `LIST_OCCUPATIONS_SORTABLE` whitelists and emits `NULLS LAST` + a stability tiebreaker. `get_occupation_ai_impact` and `get_cluster_ai_impact` rewritten to always include narratives + trends + coverage. `count_occupations` gains the same LEFT JOIN + trend WHERE clauses (was previously ignoring trend filters entirely \u2014 bug fix). New constants `AI_IMPACT_VALID_TRENDS`, `AI_IMPACT_VALID_FLAGS`, `AI_IMPACT_METHODOLOGY_VERSION_FALLBACK`.", "REST \u2014 `src/hybrid_server_mcp.py`: `list_clusters_rest` and `list_occupations` now extract `expand`, `entry_barrier_trend`, `expertise_premium_trend`, `sort`/`order_by`, and `order`/`dir` from query params and forward them through the `options` dict. `get_ai_impact` (the legacy `/api/occupations/{id}/ai-impact` form) now forwards `summary` + `ai_impact` blocks instead of whitelisting them out (the prior whitelist was silently dropping the new fields). New handler `get_cluster_ai_impact_rest` registered at `GET /api/clusters/{cluster_id}/ai-impact` \u2014 path-natural mirror of the cluster: form, returns 400 on non-integer cluster_id and 404 on missing cluster.", "DSL \u2014 `src/dsl/ast_executor.py`: `OCCUPATION_FIELD_MAP` and `CLUSTER_FIELD_MAP` gain `ENTRY_BARRIER_TREND` and `EXPERTISE_PREMIUM_TREND` so `WHERE` clauses parse. `_exec_list_clusters` and `_exec_list_occupations` route `EXPAND` and `ORDER BY` modifiers from the AST through `modifiers_to_dsl_options(...)` into `options` for UDH. `LIST OCCUPATIONS ORDER BY COMPANY_COUNT` keeps its existing dedicated path.", "API REFERENCE DOCS \u2014 `api-reference.txt`, `ops/API_REFERENCE.md`, `docs/api-reference/API_REFERENCE_V6_UNIFIED.md`, `docs/api-reference/REST_API_REFERENCE_EXTERNAL.md`: new `What's New (0.11.7-beta)` blocks summarising the surface; `/api/occupations` and `/api/clusters` parameter tables gain `entry_barrier_trend`, `expertise_premium_trend`, `expand`, `sort`, `order` rows; `/api/occupations/{id}/ai-impact` response section rewritten to show narratives + trends + coverage; new `/api/clusters/{id}/ai-impact` endpoint section with full response shape and 400/404 errors. The external doc carries an explicit `UNSTABLE / SUBJECT TO CHANGE` warning per the 3-tier external rule (this surface is pre-1.0, limited distribution).", "WEB UI \u2014 `web-test-ui-simplified.html`: in-page API reference modal gets new REST and DSL example sets covering the narrative + trend surface, plus explanatory notes on trend filter values, ai_flag values, sortable fields, NULLS LAST behaviour, alias note for order_by= / dir=, and coverage block semantics for unclassified clusters.", "CONTEXT DOCS \u2014 `docs/context/ai-impact.md`: new `Filter by trend`, `Inline narratives + trends on list responses`, and `Sort by ai_flag or trend` sub-sections under Syntax. `docs/mcp/MCP_CONTEXT.md` (legacy/fallback): new `Narrative + trend retrieval` paragraph in the AI Impact section.", "MODEL-FACING \u2014 `src/mcp/tool_definitions.py:_LEAN_DSL_DESCRIPTION`: KEY PATTERNS gains `LIST CLUSTERS WHERE ENTRY_BARRIER_TREND IS \"Rising\" EXPAND AI_IMPACT LIMIT 10` so every MCP client surfaces the new grammar on next tools/list. `src/nl_query_builder.py` large-model template: new `AI IMPACT NARRATIVE + TREND SURFACE (0.11.7-beta)` block listing the new REST/DSL forms plus translation rules for narrative-seeking queries (\"narrative for X\", \"advice for entry-level\") and trend-direction queries (\"rising entry barriers\", \"expertise becoming more valuable\").", "TESTS \u2014 `tests/test_ai_impact_narratives.py` (new, 27 cases): unit tests for `build_ai_impact_block` (classified/unclassified, expanded/compact, aei_ai_flag preference, fallback), `resolve_order_by` (default, unknown-field fallback, trend sort with NULLS LAST + stability tiebreaker, lowercase dir, invalid dir, default field no double-order), the sortable-map contracts, and asyncio integration coverage of `list_clusters` / `list_occupations` / `get_occupation_ai_impact` / `get_cluster_ai_impact` (block attachment, expand semantics, trend-filter SQL, LEFT JOIN presence, unclassified coverage block)."], "migration": "No client migration required for existing callers. New surface is purely additive; old endpoints and queries return identical responses. Two callouts: (1) the legacy `/api/occupations/{onet_code}/ai-impact` and `/api/occupations/cluster:{id}/ai-impact` responses now include `low_expertise_narrative`, `high_expertise_narrative`, `entry_barrier_trend`, `expertise_premium_trend`, and `coverage` fields under `ai_impact` \u2014 clients that strictly schema-validated the prior shape may need to allow additional fields; (2) `COUNT OCCUPATIONS WHERE ENTRY_BARRIER_TREND IS \"...\"` now actually filters (was previously returning the unfiltered total \u2014 verify any downstream code expecting the buggy behavior).", "validated_on": "Local PostgreSQL (`postgres-split` + `aoi-mcp-split-pg` on `localhost:8091`). 27/27 unit tests in `tests/test_ai_impact_narratives.py`. 34/34 cases in the live parity matrix covering: REST list-with-expand on both endpoints, REST trend filters (entry + premium, both directions), REST sort on ai_flag and both trends, REST combined filter+sort+expand, REST regression on existing `ai_flag` filtering, REST dedicated `/api/clusters/{id}/ai-impact` (classified, unclassified, 400 on bad input, 404 on missing), REST `/api/occupations/{onet}/ai-impact` and the `cluster:` alias, DSL EXPAND AI_IMPACT (both LIST forms), DSL trend WHERE filters (both fields, both LIST forms), DSL ORDER BY trend, DSL combined filter+sort, DSL ai_flag + trend sort, DSL COUNT with trend filter (regression vs unfiltered), DSL bare COUNT (regression), DSL GET AI_IMPACT FOR OCCUPATION."}, {"version": "0.11.6-beta", "date": "2026-05-07", "type": "docs", "breaking": false, "summary": "Methodology + provenance refresh against the canonical WYWM source (`whereyouworkmatters.org`). Fixes stale claims that were hitting every model on every MCP `tools/list` call. Sponsorship corrected (Burning Glass Institute / Schultz Family Foundation, methodology in partnership with Harvard's Managing the Future of Work Project \u2014 the Wall Street Journal attribution was wrong). Counts updated to ~1,750 companies and 1,016 unique O*NET occupations across ~55,000 (company \u00d7 occupation) pairs assessed. Observation window softened to predominantly 2019\u20132024. Two-tier badge thresholds documented explicitly (per (company \u00d7 occupation) uses 80th/60th percentiles; per archetype-per-company and overall-per-company use the top fifth / next fifth, re-derived each refresh). Overall-score 0.75 down-weighting on Early Career and Growth archetypes called out. AI-flag provenance noted as a separate BGI research project that is a sequel to the WYWM core methodology. Cluster definition added (BGI internal cohort grouping that clusters O*NET occupations by behavior in the labor data, distinct from the rigid O*NET taxonomy). Wage table provenance softened to provisional and explicitly distinguished from the company-level wage signals that feed Stability badges. Open questions for BGI consolidated in a new `docs/planning/ASK_BGI.md`. No code-path or DSL/REST behavior changes; this is a content delta surfaced through `tools/list` descriptions and HELP topics.", "details": ["TOOL DESCRIPTION \u2014 `src/mcp/tool_definitions.py:_LEAN_DSL_DESCRIPTION`: company/occupation counts, sponsorship line, and observation window updated. Every MCP client (Bedrock bridge, Claude Desktop, Cursor, Open WebUI) sees the new values on the next `tools/list` after deploy.", "NL PROMPT \u2014 `src/nl_query_builder.py`: large-model template `CONTEXT` block updated with the same counts and a coaching nudge that great-job opportunities live at the (company \u00d7 occupation) tier rather than the company-wide rollup. Project lineage clarified (AOI 2026 serving the WYWM dataset).", "CONTEXT DOCS \u2014 `docs/context/methodology.md` (canonical), `docs/context/index.md`, `docs/context/companies.md`, `docs/context/wages.md`, `docs/context/clusters.md`, `docs/context/ai-impact.md`, `docs/mcp/MCP_CONTEXT.md`: stale claims fixed end-to-end. `MCP_CONTEXT.md` header now flags the file as legacy/fallback and points to `methodology.md` as canonical. `clusters.md` gains a `What a cluster is (and isn't)` section. `wages.md` flags the per-occupation wage table as provisional and distinct from the company-level wage signals behind Stability.", "PROVENANCE LEDGER \u2014 `docs/context/data_provenance.md` (new): authoritative source map (links to WYWM public pages), summary of corrections applied, list of facts verified against the source, and a `Resolved internally / Still open` split that redirects unresolved items to `ASK_BGI.md`.", "BGI OUTREACH QUEUE \u2014 `docs/planning/ASK_BGI.md` (new): four primary questions (wage data sources split by WYWM Stability signal vs. per-occupation wage table; 12M-worker corpus composition; AI-flag overlay citation + refresh cadence; Z-score peer set rules) plus a bucket for the 2026 Methodology Supplement PDF. Not blocking; outreach can happen on its own cadence.", "TRACE MANUAL \u2014 `docs/planning/AGENT_TRACE_MANUAL.md`: \u00a713 marked as 'high-confidence corrections landed' with pointer to `data_provenance.md` for residual open items. Convergence loop (\u00a78) gains a benchmark snapshot step driven by `scripts/benchmark-run-matrix.sh` for post-deploy measurement across the three model presets.", "BENCHMARK MATRIX \u2014 `scripts/benchmark-run-matrix.sh` (new): wrapper around `scripts/benchmark-chat-context.py` that runs the existing 25-case 6-tier suite once per preset (`aoi-career-assistant` / Gemini, `aoi-career-local` / Bedrock-Qwen, `aoi-career-claude` / Claude), saves a timestamped JSON report per preset under `usage_reports/`, and prints a side-by-side summary. Used to capture the post-deploy 'after' snapshot for the \u00a78 convergence loop."], "migration": "No client migration required. Clients that cached or reasoned over the prior counts (1,752 companies; explicit BLS OEWS provenance on wages; WSJ sponsorship) will see corrected values on the next `tools/list` or HELP call. No DSL/REST shape changes, no new endpoints, no schema changes.", "validated_on": "Local lints clean on `tool_definitions.py` and `nl_query_builder.py` (string-only edits). Staging then production via `./scripts/deploy-staged.sh` + `--promote`. Post-deploy verification: (a) `curl https://aonav.ai/mcp/v1` returns the updated `_LEAN_DSL_DESCRIPTION`; (b) `HELP METHODOLOGY` shows the corrected sponsorship and two-tier badge framing; (c) `HELP CLUSTERS` shows the new 'What a cluster is' section; (d) `scripts/benchmark-run-matrix.sh --target prod` saves baseline reports under `usage_reports/`."}, {"version": "0.11.5-beta", "date": "2026-05-07", "type": "fix", "breaking": false, "summary": "Close the REST/DSL parity gap on `ai_flag` filtering. `GET /api/occupations?ai_flag=...` and `GET /api/clusters?ai_flag=...` were silently dropping the filter (the route handlers never extracted the param), and `LIST CLUSTERS WHERE AI_FLAG IS \"...\"` was being rejected as an unknown WHERE field. Both REST endpoints now honor `ai_flag` end-to-end and the DSL form is registered in `CLUSTER_FIELD_MAP`. The 0.11.4-beta vocabulary refresh missed a Gen-2 leftover in `api-reference.txt` and is corrected here.", "details": ["REST \u2014 `src/hybrid_server_mcp.py`: `list_occupations` (route `GET /api/occupations`) and `list_clusters_rest` (route `GET /api/clusters`) now extract `ai_flag` from query params and forward it to the unified handler. `unified_data_handler.list_occupations` and `unified_data_handler.list_clusters` already accepted `ai_flag` in their filters dict; only the REST pass-through was missing.", "DSL \u2014 `src/dsl/ast_executor.py`: `CLUSTER_FIELD_MAP` gains `\"AI_FLAG\": \"ai_flag\"`. `LIST CLUSTERS WHERE AI_FLAG IS \"...\"` now resolves to the `oi.ai_flag = :ai_flag` clause in `list_clusters`. Adjacency forms (`LIST SKILL_ADJACENT_CLUSTERS`, `LIST DESTINATION_CLUSTERS`) and occupation forms (`LIST OCCUPATIONS`, `COUNT OCCUPATIONS`) were already wired and are unchanged.", "DOCS \u2014 `docs/api-reference/API_REFERENCE_V6_UNIFIED.md`, `ops/API_REFERENCE.md`, `docs/api-reference/REST_API_REFERENCE_EXTERNAL.md`, `api-reference.txt`: `/api/occupations` parameter table gains `ai_flag` row + Gen-3 value list + REST examples + DSL equivalent line. `/api/clusters` `ai_flag` description upgraded from cross-reference to inline value list. DSL command reference table adds `LIST OCCUPATIONS WHERE AI_FLAG IS` and `COUNT OCCUPATIONS WHERE AI_FLAG IS` rows.", "FIX \u2014 `api-reference.txt` line 746: replaced retired Gen-2 example `?ai_flag=Entry%20Level%20Jobs%20Are%20Eroding` with `?ai_flag=Raising%20the%20Bar`. Missed in the 0.11.4-beta vocabulary sweep.", "WEB UI \u2014 `web-ui-examples-rest.json`: adds `OCCUPATIONS / By AI Flag (Raising the Bar | Lower Potential)` and `CLUSTERS / By AI Flag (Raising the Bar | Winners Pull Away)` plus combined-filter examples. `web-ui-examples-dsl.json`: adds the matching `LIST OCCUPATIONS WHERE AI_FLAG IS`, `COUNT OCCUPATIONS WHERE AI_FLAG IS`, and `LIST CLUSTERS WHERE AI_FLAG IS` example sets."], "migration": "No client migration required. Clients that were calling `GET /api/occupations?ai_flag=...` and `GET /api/clusters?ai_flag=...` and silently getting unfiltered results will now get filtered results \u2014 verify your downstream code expects this. Agentic clients that were hitting `Unknown WHERE field for LIST CLUSTERS: AI_FLAG` and falling back to alternate queries can now use the natural form.", "validated_on": "Local PostgreSQL (4 forms covered: REST occupations, REST clusters, DSL LIST CLUSTERS WHERE AI_FLAG, DSL LIST OCCUPATIONS WHERE AI_FLAG). Staging then production after deploy."}, {"version": "0.11.4-beta", "date": "2026-05-06", "type": "schema", "breaking": false, "summary": "Introduce `ai_expertise_impact` table as the source of truth for cluster-level AI impact classification and audience-facing narratives, keyed on `cluster_id`. Replace the legacy 4-label `ai_flag` vocabulary in `occupation_info` with the 2026-04-21 methodology vocabulary, sourced from the new table. The Gen-2 vocabulary (Winners Pull Away / Entry Level Jobs Are Eroding / Shrinking Fields / GenAI Opens New Doors) is retired entirely. The Gen-3 vocabulary (Raising the Bar / Shrinking Fields / Winners Pull Away / Lower Potential) places each cluster on a 2x2 of barrier-to-entry \u00d7 value-of-expertise trends, and adds per-experience-level narratives for entry-level (low-expertise) and experienced (high-expertise) practitioners.", "details": ["SCHEMA \u2014 `sql/008_add_ai_expertise_impact.sql` + `sql/schema_postgres.sql`: new table `ai_expertise_impact (cluster_id PK, cluster_name, ai_flag, entry_barrier_trend, expertise_premium_trend, low_expertise_narrative, high_expertise_narrative, methodology_version, source_file, created_at, updated_at)` with FK to `occupation_info(cluster_id)` ON DELETE CASCADE. Indexes on `ai_flag`, `entry_barrier_trend`, `expertise_premium_trend`, `methodology_version`. 177 rows seeded from `data/4-21 update/cluster_level_expertise_upheaval_descriptions.csv`.", "DATA \u2014 `data/4-21 update/ai_expertise_impact_seed.csv` (177 rows, snake_case columns) is the canonical seed for the new table. `data/4-21 update/occupation_info_seed_v2.csv` is the new template for re-importing `occupation_info` (drops the `ai_flag` column entirely; ai_flag is now sourced from `ai_expertise_impact`).", "VOCABULARY \u2014 Gen-3 ai_flag values: `Raising the Bar` (63 clusters: barriers\u2191 + expertise premium\u2191), `Shrinking Fields` (26: barriers\u2191 + premium\u2193), `Winners Pull Away` (23: barriers\u2193 + premium\u2191), `Lower Potential` (65: barriers\u2193 + premium\u2193). Gen-2 values are wiped from `occupation_info.ai_flag` and replaced via sync from `ai_expertise_impact`. The two retained label strings (`Shrinking Fields`, `Winners Pull Away`) cover different cluster memberships under Gen-3 \u2014 clients filtering by these labels will see different result sets.", "BEHAVIOR \u2014 `occupation_info.ai_flag` is now a denormalized cache of `ai_expertise_impact.ai_flag`. `LIST OCCUPATIONS WHERE AI_FLAG IS \"...\"` and other AI_FLAG filter paths continue to work without code change; the column is overwritten by the import script. The new fields (`entry_barrier_trend`, `expertise_premium_trend`, `low_expertise_narrative`, `high_expertise_narrative`) are stored but not yet exposed via DSL/REST \u2014 that surface lands in a follow-up release.", "MIGRATION \u2014 `scripts/import_ai_expertise_impact.py` runs migration 008, UPSERTs the seed CSV, wipes Gen-2 `occupation_info.ai_flag` (`UPDATE \u2026 SET ai_flag = NULL`), and resyncs from `ai_expertise_impact`. PostgreSQL only. Refuses to run --apply against `PG_HOST` containing `rds.amazonaws.com` + `PG_DATABASE='aoi_data_v6'` unless `--i-know-this-is-prod` is passed; routine production rollouts must go through the staged deploy pipeline.", "DOCS \u2014 vocabulary refresh in `docs/context/ai-impact.md`, `docs/context/occupations.md`, `docs/context/schema.md`, `docs/context/pathways.md`, `docs/context/convert.md`, `docs/AOI_DSL_CONTEXT.md`, `docs/llm-agent-prompt.md`, `docs/mcp/MCP_CONTEXT.md`, `docs/api-reference/API_REFERENCE_V6_UNIFIED.md` (and ops mirror), `docs/api-reference/REST_API_REFERENCE_EXTERNAL.md`, `api-reference.txt`, `web-ui-examples-dsl.json`, `web-ui-examples-rest.json`. Prompt contexts updated in `src/nl_query_builder.py`, `src/prompt_templates.py`, `src/mcp/tool_definitions.py`, `src/hybrid_server_mcp.py` capability blob.", "ROLLOUT \u2014 `docs/deployment/AI_EXPERTISE_IMPACT_ROLLOUT_2026-05-06.md` documents the staging-only rollout sequence with the production-isolation guarantees (separate logical databases on shared RDS instance) and verification steps."], "migration": "Operational: clients that hardcoded the Gen-2 ai_flag strings (`Entry Level Jobs Are Eroding`, `GenAI Opens New Doors`) will see empty result sets. Clients filtering on `Shrinking Fields` or `Winners Pull Away` will see different cluster memberships. There is no schema-breaking change \u2014 `occupation_info.ai_flag` column remains, `LIST OCCUPATIONS WHERE AI_FLAG IS \"...\"` syntax remains. Run `scripts/import_ai_expertise_impact.py --apply` against staging first, then promote.", "validated_on": "Pending \u2014 see rollout doc for the local-test \u2192 staging-import \u2192 smoke-verify \u2192 promote sequence."}, {"version": "0.11.3-beta", "date": "2026-05-06", "type": "fix", "breaking": false, "summary": "Restore and finalize the Open WebUI Bedrock Qwen bridge plus the occupation-level hiring-velocity DSL path. The release exposes the internal `/openai/v1` Bedrock adapter for Open WebUI, executes AOI tool calls inside the bridge with safe Bedrock conversation replay, prevents blank assistant messages after max tool turns, and teaches `LIST COMPANIES FOR OCCUPATION` to filter and rank by per-company-occupation badges and postings quintile.", "details": ["API BRIDGE \u2014 `src/hybrid_server_mcp.py`: adds `/openai/v1/models` and `/openai/v1/chat/completions` as an internal OpenAI-compatible adapter over Bedrock Converse. The adapter validates `OPENWEBUI_BEDROCK_API_KEY` or `MCP_API_KEY`, exposes the `aoi-bedrock-qwen` model alias, supports non-streaming and SSE-style streaming responses, and translates OpenAI tool definitions/messages to Bedrock tool use blocks.", "AGENT TOOLS \u2014 the Bedrock bridge executes AOI DSL/title/company tool calls internally through `UnifiedMCPToolsRegistry` when Qwen requests `aoi_dsl_query`, `aoi_title_convert`, or `aoi_company_convert` (including OpenWebUI-prefixed names like `aoi_aoi_dsl_query`) and performs follow-up Bedrock calls before returning final text. This avoids tool-call-only empty assistant turns in Open WebUI.", "FIX \u2014 Bedrock bridge normalizes canonical AOI MCP tool names when OpenWebUI prefixes them, preserves structured MCP tool errors when feeding tool results back to Qwen, and sanitizes malformed tool-use names before replaying assistant tool calls back into Bedrock conversation history. If Qwen emits query text as a tool name, the bridge keeps the raw name for the unsupported-tool error but stores a Bedrock-valid placeholder so Converse does not fail validation on `[a-zA-Z0-9_-]+`.", "FIX \u2014 Bedrock bridge makes a final no-tools synthesis call when the last allowed internal tool turn produced only another `toolUse`. This prevents Open WebUI from saving a blank assistant message (`content: null`) after the bridge has already executed the final tool result.", "FIX \u2014 Bedrock final synthesis keeps `toolConfig` when replayed history contains `toolUse`/`toolResult` blocks, while adding an explicit text-only synthesis instruction. Bedrock validates prior tool blocks against `toolConfig` even when no further tools should be called; removing it caused `ValidationException: The toolConfig field must be defined` on some multi-tool Open WebUI prompts.", "GUIDANCE \u2014 Bedrock bridge injects AOI-specific tool-use guidance for job-title + early-career badge + hiring-velocity questions, steering Qwen toward `aoi_title_convert` followed by `LIST COMPANIES FOR OCCUPATION \"<ONET_CODE>\" WHERE BADGE_EARLY_CAREER IS \"Platinum\" ORDER BY POSTINGS_QTILE DESC LIMIT 10` instead of invented `LIST COMPANIES WHERE OCCUPATION ...` syntax.", "DSL \u2014 `src/dsl/ast_executor.py` + `src/unified_data_handler.py`: `LIST COMPANIES FOR OCCUPATION` now supports occupation-level filters `BADGE_EARLY_CAREER`, `BADGE_GROWTH`, `BADGE_STABILITY`, `POSTINGS_QTILE`/`POSTINGS_COUNT_QTILE`, plus `ORDER BY POSTINGS_QTILE`. Results now include `postings_count_qtile`, enabling direct hiring-velocity ranking for a resolved O*NET occupation.", "AUTH \u2014 `src/auth/middleware_strict.py`: allows only the two internal bridge paths through the JWT middleware; the route handlers still require the internal bearer key before any Bedrock call is made.", "OPENWEBUI \u2014 `docker-compose.split-api.yml`: adds the AOI Bedrock provider (`http://aoi-mcp-split-pg:8090/openai/v1`) before Gemini/OpenAI/Anthropic in `OPENAI_API_BASE_URLS`, using `${MCP_API_KEY}` as its provider key. `openwebui/setup-models.sh` points `aoi-career-local` at `aoi-bedrock-qwen`, creates/updates the Claude preset, and avoids brittle shell-interpolated JSON payload construction.", "SEARCH \u2014 `src/database.py`: company search now uses `LOWER(...) LIKE LOWER(:term)` so PostgreSQL matches lower-case user queries like `micro` against `Microsoft`. This restores the functional `SEARCH COMPANIES` / `/api/companies/search` behavior after the ES decommission moved search fully onto PG FTS/LIKE paths.", "SECURITY \u2014 removes hardcoded cloud LLM API key defaults from `docker-compose.unified.yml` and `docker-compose.mac.yml`; these now read from environment variables only."], "migration": "No client migration. The existing Open WebUI preset id `aoi-career-local` remains active but displays as Bedrock Qwen and uses the internal `aoi-bedrock-qwen` base model. Existing well-formed DSL continues to work; `LIST COMPANIES FOR OCCUPATION` gains additional supported filters and a new returned `postings_count_qtile` field.", "validated_on": "Local py_compile and lints passed. Staging and production API containers rebuilt. Verified local == container runtime hashes for CHANGELOG and touched Python files, code markers present, exact DSL query returned Cardinal Health/Uline/Wolters Kluwer with postings quintile 5, Bedrock Qwen streamed non-null content with the expected ranked companies, SEARCH COMPANIES micro returned Microsoft, Open WebUI production exact prompt returned non-null Bedrock Qwen content, staging Open WebUI production-volume copy verified model selector + MCP tool injection, structured staging Open WebUI prompt returned Cardinal Health/Uline/Wolters Kluwer, and the final-synthesis toolConfig regression no longer returns HTTP 500. Staging smoke 17/17, staging functional 118/119 pass (1 skipped), production smoke 17/17, production functional 118/119 pass (1 skipped)."}, {"version": "0.11.2-beta", "date": "2026-05-04", "type": "chore", "breaking": false, "summary": "ChromaDB residue cleanup. Companion to 0.11.1-beta (Elasticsearch decommission). ChromaDB itself was retired infrastructure-side on 2026-04-23 (PR-B), but ~1,070 lines of dead client/seed/factory code, plus a `/health` probe that always reported `disconnected`, were intentionally deferred. This release deletes that residue, renames the `DisambiguationService` internal attribute from `self.chromadb` to `self.vector_store` (with cleanly-degrading None-handling on non-Postgres backends), and sweeps the live operational docs to point at the canonical pgvector seeder. The vector store path itself has been pgvector-only in production since 2026-04-23 and is unchanged here. (Originally drafted as 0.10.5-beta; rebased onto 0.11.1-beta as 0.11.2-beta.)", "details": ["CODE \u2014 `src/chromadb_client.py` (528 LOC), `src/seed_chromadb.py` (~440 LOC), `src/vector_client_factory.py` (~70 LOC), and `tests/test_chromadb_client.py` (~130 LOC) DELETED \u2014 total 4 files, 1,234 lines. None had Python importers after the request-path cleanup landed in this branch's earlier commit. `src/disambiguation_service.py`: drop `from chromadb_client import \u2026`; constructor parameter `chromadb_client` \u2192 `vector_store_client` (no kwarg callers, verified); attribute `self.chromadb` \u2192 `self.vector_store` (8 sites); MySQL path no longer instantiates ChromaDB \u2014 leaves `vector_store = None` and Tier 2 RAG cleanly skipped; `initialize()` now succeeds even without a vector store so Tier 1 + Tier 3 still run; all `is_connected()` guards updated.", "RESPONSE FIELDS \u2014 None. The disambiguation response shape, `signals_used` values, `method` values, and the `_bypass` query parameter behavior are all unchanged. Only `_bypass`'s docstring was updated to say `pgvector RAG` instead of `ChromaDB RAG`.", "HEALTH \u2014 `src/hybrid_server_mcp.py`: deleted the entire `# Check 5: ChromaDB` block in `/health` (~48 lines). The block has been silently emitting `status: disconnected` since the container was retired 2026-04-23. Vector-store health is now reflected in `database.status` since pgvector lives inside RDS PG. Clients that key off the `chromadb` block in `/health` need to remove that check (the field is gone, not just changed).", "REGISTRY \u2014 `src/admin/table_schema_registry.py`: dataclass field `chromadb_collection` \u2192 `vector_collection`; accessor `get_chromadb_collection()` \u2192 `get_vector_collection()`; 3 `_register()` call sites updated. `src/admin/sync_orchestrator.py`: import + variable names updated to match (`chroma_collections` \u2192 `vector_collections`, `chromadb_collections_queued` \u2192 `vector_collections_queued`).", "INFRASTRUCTURE \u2014 `requirements.txt`: removed `chromadb>=0.4.22` pin (no longer imported anywhere). `docker-compose.split-api.yml` + `docker-compose.unified.yml`: comment cleanup pointing the operator at `scripts/seed_pgvector.py`. There is no docker-compose service block to remove \u2014 that was already done in PR-B (2026-04-23).", "DOCS \u2014 `DEVELOPMENT_PROTOCOLS.md`: replaced the `\ud83d\udea8 CRITICAL RULE: ChromaDB Seed` section with a `\ud83d\udea8 CRITICAL RULE: pgvector Seed` section pointing at `scripts/seed_pgvector.py`; cleaned the Local/EC2 cheat-sheet code blocks that still referenced `seed_chromadb.py` and `sync_mysql_to_es.py`. `.cursor/rules/postgres-migration.mdc`: dropped deleted-file globs (`src/seed_chromadb.py`, `src/vector_client_factory.py`); replaced the post-data-change ChromaDB seed step with the pgvector seeder. `docs/internal/UNDOCUMENTED_APIS.md`: `_bypass` description updated; `AOI_SKIP_CHROMADB` env var removed; manual-seed example replaced with the pgvector seeder. `docs/deployment/DEPLOYMENT_INSTANCES.md`: removed ChromaDB / chromadb-split entries from container tables; replaced volume-warning text. `docs/deployment/TEST_DEPLOYMENT_CHECKLIST.md`: replaced ES + ChromaDB sync steps with pgvector + prefix-index reload. `docs/planning/POST_MERGE_RESYNC_PROCEDURE.md`, `ops/runbooks/007_client_company_changes_2026-03.md`, `ops/STAGING_LAMBDA_STRESS_TEST.md`: same \u2014 ChromaDB seed steps repointed at `scripts/seed_pgvector.py`. `tests/test_disambiguation_service.py`: docstring + log strings updated.", "DOCS NOT TOUCHED (intentionally preserved as historical record) \u2014 dated change logs (`docs/deployment/DATA_CHANGE_LOG_2026-*.md`), staging notes (`docs/deployment/STAGING_NOTES_2026-04-24.md`), planning specs (`docs/planning/DSL_ADMIN_WRITE_SPEC.md`, `docs/planning/DISAMBIGUATION_IMPLEMENTATION_PLAN.md`, `docs/planning/INFRASTRUCTURE_SPLIT_PLAN.md`, `docs/planning/MANBA_EXPLAIN_AND_METHODOLOGY_SPEC.md`), the architecture/audit walkthroughs, and the regression callout in `.cursorrules` \u00a7 13. They describe what shipped at the time and remain accurate as history."], "migration": {"summary": "One observable change for clients: `/health` no longer returns a `chromadb` block (field deleted, not renamed). Disambiguation behavior, response shapes, and search results are unchanged.", "verified_with_parity_diff": "Yes. The same `tools/parity_capture.sh` + `tools/parity_diff.py` harness used for 0.11.1-beta runs against this build with 0 substantive diffs (the harness already strips the `chromadb` block before diffing because it has been emitting `disconnected` since 2026-04-23).", "rollback": "Revert the 3-commit range and redeploy. The deleted modules are recoverable from git history; pgvector remains live and unaffected on either side of the rollback."}, "validated_on": "Local: ast lint clean on all 5 modified Python files; `python3 -c 'import disambiguation_service'` succeeds with the four deleted files removed; full repo grep confirms no remaining Python importer of `chromadb_client`, `seed_chromadb`, or `vector_client_factory`."}, {"version": "0.11.1-beta", "date": "2026-05-04", "type": "chore", "breaking": false, "summary": "Decommission Elasticsearch end-to-end. ES has been functionally absent from production since the `es-split` container exited 2026-04-27 15:17:59 UTC and was never restarted; search has been served entirely by PostgreSQL FTS + the in-memory CompanyMatcher prefix index for the past 5 days. This release removes the dead ES request paths, the ES client module, the ES service blocks from every docker-compose file, and the ES sync sections from the operator docs \u2014 so restart behavior is deterministic and there is no longer a service whose presence or absence silently changes scoring. Includes a small MySQL-noise sweep in the same files (docstrings/comments only \u2014 runtime dialect helpers are intentionally preserved). Companion: new `tools/parity_capture.sh` + `tools/parity_diff.py` verification harness used to validate behavior preservation against the prod query bag. (Originally drafted as 0.10.4-beta; rebased onto 0.11.0-beta as 0.11.1-beta.)", "details": ["CODE \u2014 `src/elasticsearch_client.py` DELETED (960 lines, no live importers). `src/unified_data_handler.py`: drop ES import + init; drop ES branches in `titleconvert`, `clusterconvert`, and `search_companies`; delete `_search_companies_es`; rename `_search_companies_mysql` \u2192 `_search_companies_db`. `src/company_matcher.py`: drop ES import + init + `_find_matches_es`; rename `_find_matches_mysql` \u2192 `_find_matches_db`. `src/hybrid_server_mcp.py`: drop the `/health` `elasticsearch` check block. `src/admin/sync_orchestrator.py`: drop the ES reindex queue. `src/admin/table_schema_registry.py`: drop the `es_index` field and `get_es_index()`. `src/admin/gradio_app.py`: drop the ES status row from the admin dashboard and the placeholder `sync_es_data.py` script-list entry. `scripts/deploy-with-checksums.sh`: drop `src/elasticsearch_client.py` from the deploy file list (would otherwise scp a missing file).", "RESPONSE FIELDS \u2014 `source` field on TITLECONVERT, CLUSTERCONVERT, and SEARCH COMPANIES responses changes from `\"mysql\"` / `\"elasticsearch\"` to `\"database\"`. The TITLECONVERT REST default in `hybrid_server_mcp.py` likewise switches from `\"mysql\"` to `\"database\"`. Clients that key off `source == \"mysql\"` or `source == \"elasticsearch\"` need to update \u2014 additive note: `\"database\"` covers all current backends.", "DISAMBIGUATION SIGNAL \u2014 `signals_used` and `final_recommendation.method` no longer emit `\"elasticsearch_gap\"`; renamed to `\"score_gap\"` (the underlying score-gap heuristic is unchanged). Disambiguation tier-1 docstring updated from `Elasticsearch (existing)` to `Database FULLTEXT (existing)`.", "INFRASTRUCTURE \u2014 Removed `USE_ELASTICSEARCH` and `ELASTICSEARCH_HOSTS` env vars and the `es-split` / `aoi-elasticsearch` service blocks from `docker-compose.postgres-test.yml`, `docker-compose.split-api.yml`, `docker-compose.split-spot.yml`, `docker-compose.unified.yml`, `docker-compose.mac.yml`, and `docker-compose.split-local.yml`. `depends_on: elasticsearch` removed from the API service in `split-api.yml`, `unified.yml`, and `mac.yml`. `es_data_split` / `es_data_unified` / `elasticsearch_data` named volumes deleted. Standalone `docker-compose.elasticsearch.yml` deleted outright. All 6 active compose files validated with `docker compose config`.", "DOCS \u2014 `.cursorrules`, `ops/OPERATOR_MANUAL.md` (\u00a77 Search Index Sync, \u00a77c COMPANYCONVERT Autocomplete, \u00a713.4 Search Results Wrong, \u00a713.5 Database Connection Failed, Appendix C ports table, \u00a715 What Operators MUST NOT Do), `docs/deployment/ARCHITECTURE_AND_OPS.md` (architecture diagram, T3 production/staging container tables, \u00a77 Search service description, \u00a78c Reload Prefix Index runbook, retired-files table) all updated. Each retains an explicit `Elasticsearch decommissioned (2026-05-02)` callout naming the dead surfaces and the date prod last had ES.", "MYSQL NOISE \u2014 In files already touched by the ES removal, swept stale `MySQL` references in docstrings and log strings (`titleconvert`/`clusterconvert`/`search_companies` docstrings; `reload_prefix_index` docstring; login priority comments). The runtime `db_dialect.is_mysql()` branches are intentionally LEFT in place \u2014 they are functional dialect helpers, not noise.", "VERIFICATION HARNESS \u2014 New `tools/parity_capture.sh` + `tools/parity_diff.py`. Captures a frozen 22-query bag (COMPANYCONVERT \u00d7 10 incl. `car`/`carl`/`carlyle`/`hom`/`home`, TITLECONVERT \u00d7 5, CLUSTERCONVERT \u00d7 3, SEARCH COMPANIES \u00d7 2, DSL \u00d7 2) against any base URL with JWT login, normalizes volatile fields + LLM-derived non-deterministic subtrees + the renamed `source` value + the renamed `signals_used` label, canonicalizes `data[]` arrays so PG `ts_rank` tiebreak shuffling doesn't show as a diff, and curl-retries on transient Bedrock disambiguation slowness. Determinism check on prod: 22/22 OK, 0 diffs across two consecutive runs."], "migration": {"summary": "Three observable changes for clients: (a) `source` field value changes from `\"mysql\"`/`\"elasticsearch\"` to `\"database\"` on TITLECONVERT, CLUSTERCONVERT, and SEARCH COMPANIES; (b) `signals_used` and `final_recommendation.method` no longer emit `\"elasticsearch_gap\"` \u2014 renamed to `\"score_gap\"`; (c) `/health` no longer returns an `elasticsearch` block. Search results themselves are unchanged (same code path that has been live in prod since 2026-04-27).", "verified_with_parity_diff": "Yes. Run `bash tools/parity_capture.sh https://aonav.ai > before.json` against the previous deployment, then the same against the new deployment, and `python3 tools/parity_diff.py before.json after.json` should report `Total substantive diffs: 0` (the harness normalizes the renamed source/signals values).", "rollback": "Revert this commit range and redeploy. ES infrastructure does not need to be re-provisioned for rollback to be safe \u2014 the previous code's ES path also silently fell back to PG when ES was unreachable, which is the state prod has been in for 5 days."}, "validated_on": "Local: ast lint clean on all 7 modified Python files. Compose: `docker compose config` succeeds on all 6 active compose files (postgres-test, split-api, split-spot, unified, mac, split-local). Parity tool: 22/22 OK against prod with 0 diffs across two consecutive captures."}, {"version": "0.11.0-beta", "date": "2026-05-01", "type": "feature", "breaking": false, "summary": "F2 \u2014 Adjacency metric filters. The two adjacency endpoints (LIST SKILL_ADJACENT_CLUSTERS / LIST DESTINATION_CLUSTERS, /api/clusters/{id}/skill-adj-clusters, /api/clusters/{id}/cluster-destinations) now return a per-row `metrics` block with the four F2 metric fields \u2014 always present, snapshot-keyed by (experience_level, msa_size) with defaults '10' / 'Large'. Numeric thresholds on the metric fields are exposed via WHERE >=/<=/>/< (DSL) or `min_*`/`max_*` query params (REST). Rows with null on a filtered metric are excluded; rows with null on an unfiltered metric are returned with the metric value as null. The previously-reserved `EXPAND METRICS` token is retired (was always 400'd with a deferral message; now falls through to the unknown-EXPAND-target error with a hint pointing at the new always-on contract). The DSL-operator registry gains GreaterThanOrEqualOperator (>=) and LessThanOrEqualOperator (<=); the legacy registry previously only supported > and <, causing >=/<= clauses to be rejected as malformed before this fix.", "details": ["EXECUTOR \u2014 `src/dsl/ast_executor.py`: `ADJACENCY_FIELD_MAP` extended with `INTERNAL_PROMOTION_RATE`, `EXTERNAL_PROMOTION_RATE`, `RETENTION_RATE_3YR`, `WAGE_P75_AT_10Y` (the four F2 metric fields) plus `EXPERIENCE_LEVEL` and `MSA_SIZE` (the two snapshot keys). `_collect_conditions` gains a comparison-operator branch that translates GTE/LTE/GT/LT (and the `>=`/`<=`/`>`/`<` aliases the legacy registry emits) into suffixed filter keys (`<field>__gte`/`__lte`/`__gt`/`__lt`) so the UDH layer can consume them uniformly.", "EXECUTOR \u2014 `src/dsl_operators.py`: new `GreaterThanOrEqualOperator` and `LessThanOrEqualOperator` (registered before `>` and `<` to avoid shadowing). The existing `>` and `<` regexes gain `(?!=)` lookaheads as belt-and-braces. Both new operators tolerate zero-or-more whitespace around the operator and the trailing numeric value.", "UDH \u2014 `src/unified_data_handler.py`: `_ADJACENCY_EXPAND_RESERVED = {METRICS}` deleted and the F2-deferral guard with it. `_list_adjacency_clusters` now: (1) parses metric thresholds from the suffixed filter keys with non-numeric \u2192 -32602, (2) defaults `experience_level` / `msa_size` to '10' / 'Large', (3) fetches metrics from `promotion_retention` + `wage_data` for the active target_ids at the chosen snapshot, (4) applies thresholds (rows with null on a filtered field are excluded), (5) attaches the `metrics` block to every surviving row, (6) echoes `snapshot` at the top of the response. New helper methods `_fetch_adjacency_metrics`, `_row_passes_metric_thresholds`, `_empty_adjacency_metrics`, `_coerce_metric_value`. The `unknown_expand` branch picks up `METRICS` automatically since it's no longer in the reserved set; when `METRICS` is the unknown target the error includes a `data` field pointing at the new always-on contract.", "MCP \u2014 `src/mcp/tool_definitions.py`: lean DSL description's two adjacency examples updated to demonstrate the new metric-filter form (LIST SKILL_ADJACENT_CLUSTERS FOR CLUSTER \"187\" WHERE WAGE_P75_AT_10Y >= 125000 ...). Models now see the F2 surface in their first context message.", "REST \u2014 `src/hybrid_server_mcp.py`: `_adjacency_rest` now parses `experience_level`, `msa_size`, and `min_*`/`max_*`/`min_excl_*`/`max_excl_*` query params for each of the four metric fields. Non-numeric values return HTTP 400 with the offending parameter named. Suffixed filter keys handed off to UDH unchanged.", "DOCS \u2014 `docs/context/pathways.md`: WHERE filter table split into Categorical / Snapshot / Metric Threshold subsections with full per-field operator support. EXPAND modes table updated (METRICS row removed; its retirement called out). Returns section gains three example payloads (default, with metric filter, with EXPAND NAMES + COMPANY_HIRING) showing the always-on `snapshot` + `metrics` fields. Error contract row for `EXPAND METRICS` updated to reflect the new behavior. 'What's coming next' F2 entry moved to a new 'What shipped already' section.", "DOCS \u2014 `docs/api-reference/REST_API_REFERENCE_EXTERNAL.md`: header bumped to 2026.2 / 2026-05-01. New 'What's New (since 2026-04-01)' section with annotation key (\ud83c\udd95/\u26a0\ufe0f/\ud83d\udce6), breaking-change disclosure, new-endpoint table, new-parameter table, deprecation table, behavior-change list, and error-envelope shape. Adjacency endpoints added to the Clusters table with \ud83c\udd95 markers. Two new detail sections (`/api/clusters/{id}/skill-adj-clusters` and `/api/clusters/{id}/cluster-destinations`) covering all parameters (including the F2 metric filters and snapshot keys), examples (default snapshot, metric-filtered, O*NET-baselined, company-scoped), full response shape, and error envelopes for non-numeric thresholds and the retired `expand=metrics` token.", "TESTS \u2014 `tests/test_ast_parser.py`: new `TestAdjacencyMetricFieldMap` class (14 cases \u2014 parametrized over both adjacency subjects \u00d7 all four metric fields \u00d7 all four comparison operators, plus snapshot-keys-pass-strict-gate, plus a combined-clauses worked example, plus a conservation check that metric fields stay rejected on `LIST COMPANIES`). New `TestAdjacencyExpandMetricsRetired` (asserts `EXPAND METRICS` falls through to unknown-target error with the new always-on hint). New `TestAdjacencyMetricThresholdEvaluation` (7 pure-Python cases over `_row_passes_metric_thresholds` covering inclusive/exclusive bounds, null exclusion when filtered, null tolerance when unfiltered, multi-field AND-semantics). New `TestAdjacencyComparisonOperatorParsing` (4 parametrized cases over the four comparison operators, asserting the resulting filter key matches the suffixed convention regardless of which surface emitted the operator). All 211 ast_parser tests pass (30 new). Pre-existing infra-dependent test failures (test_phase1_foundation, test_titleconvert_sector_filtering, test_mcp_integration) are unchanged."], "migration": {"summary": "Pure additive feature. All well-formed pre-0.11.0-beta queries return identical responses. Adjacency response payloads gain a top-level `snapshot` field and a per-row `metrics` block; existing fields are unchanged. The previously-reserved `EXPAND METRICS` token now returns a different (more helpful) 400 error \u2014 but well-formed callers never saw the old deferral message because the F2 release was always blocked.", "additive_only_for_well_formed": "All well-formed adjacency queries return identical responses plus the new `snapshot` and `metrics` fields. Clients that don't read those fields are unaffected.", "rollback": "Revert `src/dsl/ast_executor.py`, `src/dsl_operators.py`, `src/unified_data_handler.py`, `src/mcp/tool_definitions.py`, and `src/hybrid_server_mcp.py` to 0.10.3-beta to remove the metrics surface. The DSL grammar will lose >=/<= support for the four metric fields and the snapshot/metrics block will disappear from responses; `EXPAND METRICS` will go back to its previous deferral-message 400."}, "validated_on": "Local: 211/211 ast_parser tests pass (30 new). Lints clean. Pre-existing infra-dependent test failures (test_phase1_foundation, test_titleconvert_sector_filtering, test_mcp_integration) are unchanged from main."}, {"version": "0.10.3-beta", "date": "2026-04-27", "type": "fix", "breaking": false, "summary": "Three review-driven follow-ups to 0.10.2-beta caught before promote. (1) `LIST COMPANIES FOR CLUSTER ... WHERE CBSA|HIRING_POSTINGS_QTILE|TOP10|...` regression: the new strict WHERE-field gate ran with COMPANY_FIELD_MAP unconditionally, so documented cluster-specific filters were rejected as unknown before reaching the FOR-CLUSTER branch. The field map is now resolved from the FOR clause first, then validated. (2) Adjacency coverage gap: `LIST SKILL_ADJACENT_CLUSTERS` and `LIST DESTINATION_CLUSTERS` were not routed through `_build_filters` and therefore still silently dropped bad WHERE clauses \u2014 the exact failure mode 0.10.2-beta closed everywhere else. They now surface the same `-32602` error envelope. (3) `scripts/deploy-staged.sh` foot-gun: removing `.env.deploy` from the tarball (correctly, in `1de4e8d`) combined with the `>>` append step meant a missing remote file would be silently created with only `GPU_INSTANCE_IP` + `PRIMARY_IP`, letting compose fall back to baked-in defaults. A remote preflight now refuses to build/up unless `.env.deploy` exists with the required keys, on both staging and the promote path.", "details": ["EXECUTOR \u2014 `src/dsl/ast_executor.py`: new module-level `CLUSTER_COMPANIES_FIELD_MAP` (POSTINGS_QTILE, BADGE_*, INDUSTRY, TOP10, CBSA, HIRING_POSTINGS_QTILE / HIRING_POSTINGS_COUNT_QTILE, deprecated HIRING_PERCENTILE) \u2014 lifted out of the inline definition inside `_exec_list_companies` so the strict gate can use it. New synthetic `(LIST, COMPANIES_FOR_CLUSTER)` entry in `FIELD_MAPS` and `SUBJECT_LABELS` so error messages and valid_fields lists are scoped correctly.", "EXECUTOR \u2014 `_exec_list_companies` now resolves `field_map` and `dispatch_key` from `ast.for_clause.subject` *before* calling `_build_filters`. FOR CLUSTER \u2192 `CLUSTER_COMPANIES_FIELD_MAP` + `(LIST, COMPANIES_FOR_CLUSTER)`; FOR OCCUPATION / FOR COMPANY / bare LIST COMPANIES \u2192 existing behavior. The duplicate inline cluster_field_map and the second `where_to_filters()` call in the FOR CLUSTER branch are removed; filters are now built once.", "EXECUTOR \u2014 `_exec_adjacency` now calls `_build_filters(ast, ADJACENCY_FIELD_MAP)` instead of bare `where_to_filters()`. Bad WHERE fields on `LIST SKILL_ADJACENT_CLUSTERS` and `LIST DESTINATION_CLUSTERS` now return the same structured `-32602` envelope as the other list paths. New `SUBJECT_LABELS` entries for both subjects so messages read correctly.", "DEPLOY \u2014 `scripts/deploy-staged.sh`: new `verify_remote_env_complete()` helper checks the remote `~/aoi-mcp-server/.env.deploy` file exists and contains required keys (`PG_DATABASE`, `PG_USERNAME`, `PG_PASSWORD`, `JWT_SECRET`). Called against `$STAGING_IP` immediately after the tar extract (before the GPU_INSTANCE_IP/PRIMARY_IP append) and against `$PRIMARY_HOST` on the promote path before the rebuild. Required-keys list was cross-checked against `docker-compose.postgres-test.yml`'s `${VAR:-default}` blocks: `PG_DATABASE` defaults to the prod `aoi_data_v6` (so staging without an override would silently target prod data); `JWT_SECRET` defaults to the well-known sentinel `unified-server-secret-key`; PG_HOST is intentionally NOT required since staging and prod share the same RDS instance and the compose default resolves correctly. Failure prints the missing keys and the exact remediation command; exits 1.", "TESTS \u2014 `tests/test_ast_parser.py`: new `TestListCompaniesForClusterFieldMap` (9 cases \u2014 6 parametrized over CBSA / HIRING_POSTINGS_QTILE / HIRING_POSTINGS_COUNT_QTILE / TOP10 / BADGE_EARLY_CAREER / INDUSTRY all passing the strict gate via FOR CLUSTER, plus unknown-field error scoped to FOR-CLUSTER variant, conservation check that bare LIST COMPANIES still rejects CBSA, FOR OCCUPATION baseline). New `TestAdjacencyStrictFilterGate` (6 cases \u2014 2 parametrized over SKILL_ADJACENT_CLUSTERS / DESTINATION_CLUSTERS asserting unknown-field returns -32602; 4 parametrized over COMPANY / JOB_LEVEL / AI_FLAG / PREMIUM_SKILL asserting valid filters pass). 181 ast_parser tests pass (15 new).", "REVIEWERS \u2014 Three findings from PR #4 review (`https://github.com/wassef-code/aoi-mcp-server/pull/4`). All reproductions verified on `origin/fix/dsl-unknown-where-fields-structured-error@ad1e555` before the fix; full ast_parser suite green after."], "migration": {"summary": "No migration. Pure follow-up fixes to 0.10.2-beta caught before promote. The new behavior on adjacency commands matches the well-formed/error contract already in place for LIST OCCUPATIONS / LIST COMPANIES / etc. (0.10.2-beta migration notes apply unchanged).", "additive_only_for_well_formed": "All well-formed queries (including `LIST COMPANIES FOR CLUSTER ... WHERE CBSA IS ...`) return identical responses. The behavior change is only for malformed adjacency WHERE clauses, which previously silently degraded.", "rollback": "Revert `src/dsl/ast_executor.py` and `scripts/deploy-staged.sh` to 0.10.2-beta to restore pre-fix behavior."}, "validated_on": "Local: 181/181 ast_parser tests pass (15 new \u2014 9 cluster + 6 adjacency). Lints clean. Bash syntax check on deploy-staged.sh passes."}, {"version": "0.10.2-beta", "date": "2026-04-24", "type": "fix", "breaking": false, "summary": "Two-part DSL fix that closes the LLM-tripping failure mode end-to-end. Part A (parser): the IS operator now also accepts `=` and `==` as synonyms, and clauses the parser cannot interpret are surfaced on `WhereNode.unparsed` instead of being silently dropped. Pre-fix, `LIST OCCUPATIONS WHERE BADGE_EARLY_CAREER = \"Platinum\" AND POSTINGS_COUNT_QTILE = 5` produced `WhereNode(children=[])` \u2014 the WHERE clause vanished entirely and the executor ran an unfiltered LIST capped at LIMIT 100, which both Gemini 2.5 Flash and Pro misread as the true answer. Part B (executor): unknown WHERE fields and parser-level malformed clauses now return structured `-32602` errors that name the offending field/text, list valid fields for the subject, and (for known confusables like badge_*/postings_count_qtile/SECTOR/NAME on the wrong subject) emit a tailored `suggestion` + working `example` so an intelligent client can self-correct. Companion: extensive LLM-context updates so models avoid the trap in the first place \u2014 explicit limit/COUNT vs LIST guidance, full per-subject filter tables, and an end-to-end multi-step workflow for cross-entity questions like 'how many platinum-early-career occupations have postings_qtile=5 across all companies?'. Companion-2: 'quartile' comments on POSTINGS_COUNT_QTILE corrected to 'quintile (1\u20135)' in `src/hybrid_server_mcp.py` and `src/unified_data_handler.py`.", "details": ["PARSER \u2014 `src/dsl_operators.py`: `IsOperator.parse_condition` now accepts `=` and `==` as synonyms for `IS`, with both quoted and unquoted values. Whitespace-handling differs by separator: `IS` requires whitespace around it (so `FIELDIS\"x\"` doesn't match), `=`/`==` allow zero whitespace (so `FIELD=\"x\"` and `FIELD=5` both work). Order matters in the alternation: `==` is tried before `=` so it isn't shadowed.", "PARSER \u2014 `src/dsl/ast_nodes.py`: `WhereNode` gains `unparsed: List[str] = field(default_factory=list)`. Clauses the operator registry could not interpret (e.g. unsupported operators like `LIKE`, malformed text) are recorded here rather than silently dropped.", "PARSER \u2014 `src/dsl/ast_parser.py`: `_dict_to_where_node` copies `parsed[\"unparsed\"]` from the operator-registry output onto the resulting WhereNode. `_parse_simple_condition` (the regex-based fallback parser) also accepts `=`/`==` as IS synonyms in its pattern table.", "EXECUTOR \u2014 `src/dsl/ast_executor.py`: `where_to_filters()` gains optional `unmapped: Optional[List[str]] = None`. New `_collect_conditions()` recursively walks the AST and routes each leaf condition either into the filter dict (if mapped) or onto `unmapped` (if not). Backward-compatible: callers that don't pass `unmapped` get the old silent-drop semantics for the legacy code paths that still rely on it.", "EXECUTOR \u2014 new `build_unknown_field_error(unknown_fields, field_map, dispatch_key)` returns a JSON-RPC-style `-32602` error envelope: `{code, message, data, unknown_fields, valid_fields, suggestion, example, help}`.", "EXECUTOR \u2014 new `build_malformed_where_error(unparsed_clauses, dispatch_key)` returns a `-32602` error for parser-level failures, listing the unparseable text verbatim and enumerating the supported operator forms (IS / = / == / CONTAINS / IS NOT / IS NULL / IN / BETWEEN / >, >=, <, <=).", "EXECUTOR \u2014 new `SUBJECT_LABELS` map for human-friendly subject names in error messages (`('LIST', 'COMPANIES')` \u2192 'LIST COMPANIES', plus a synthetic `('LIST', 'OCCUPATIONS_FOR_COMPANY')` so the per-company occupation path gets a precise label). New `CONFUSABLE_FIELD_HINTS` map provides tailored `suggestion` + `example` for known traps (BADGE_EARLY_CAREER / BADGE_GROWTH / BADGE_STABILITY / POSTINGS_COUNT_QTILE on `LIST OCCUPATIONS` \u2192 redirect to `LIST OCCUPATIONS FOR COMPANY`; SECTOR on `LIST COMPANIES` \u2192 INDUSTRY; NAME on `LIST COMPANIES` \u2192 use COMPANYCONVERT or `WHERE COMPANY_NAME IS`).", "EXECUTOR \u2014 `ASTExecutor._build_filters()` private helper checks parser-level errors first (returns `build_malformed_where_error` if `ast.where.unparsed` is non-empty), then executor-level (returns `build_unknown_field_error` if any field is not in `field_map`). Wired into `_exec_list_companies`, `_exec_list_industries`, `_exec_list_clusters`, both `_exec_list_occupations` paths (subject + FOR-company), both `_exec_get_wages` paths, `_exec_count_companies`, and `_exec_count_occupations`. All other exec paths keep their existing behavior.", "MCP \u2014 `src/mcp/tool_definitions.py`: `_LEAN_DSL_DESCRIPTION` gains a 'RESULT SIZES' block (default LIMIT 100, max 1000, COUNT for true totals, unknown WHERE fields now error) plus two new COUNT examples. Models now see the limit/COUNT semantics in their first context message.", "DOCS \u2014 `docs/context/index.md`: new 'Result sizes and counts' subsection under syntax rules.", "DOCS \u2014 `docs/context/schema.md`: 'Pagination' rewritten as 'Result counts, LIMIT, and pagination' with explicit limit semantics. New 'Unknown filter fields are silently dropped' (now: errored) section listing valid WHERE fields per subject (LIST COMPANIES / LIST OCCUPATIONS / LIST CLUSTERS / LIST INDUSTRIES) and explicitly calling out that `BADGE_EARLY_CAREER`, `BADGE_GROWTH`, `BADGE_STABILITY`, and `POSTINGS_COUNT_QTILE` are NOT valid `LIST OCCUPATIONS` filters.", "DOCS \u2014 `docs/context/occupations.md`: new 'Common pitfall \u2014 occupation badges live per-company, not per-occupation' section with the multi-step workflow (LIST COMPANIES WHERE BADGE_EARLY_CAREER IS Platinum \u2192 LIST OCCUPATIONS FOR COMPANY <name> \u2192 aggregate with postings_count_qtile=5).", "DOCS \u2014 `docs/AOI_DSL_CONTEXT.md` and `docs/llm-agent-prompt.md` (external LLM contexts): mirror sections on result sizes, COUNT vs LIST, and per-subject valid WHERE fields.", "DOCS \u2014 `DOCUMENTATION_INDEX.md`: pointers to the new sections.", "FIX (companion) \u2014 `src/hybrid_server_mcp.py` and `src/unified_data_handler.py`: comments on the `postings_qtile` filter corrected from 'quartile (Q1\u2013Q4)' to 'quintile (1\u20135)' to faithfully describe the underlying column (which has 5 buckets, not 4). No behavior change.", "TESTS \u2014 `tests/test_ast_parser.py`: new `TestEqualityOperatorParsing` class (parametrized over IS/=/== with quoted and unquoted values, plus the exact Gemini regression query) and `TestUnparsedClauseSurfacing` class (LIKE operator surfaced; well-formed queries leave `unparsed` empty; partial garbage in AND chain). New `TestBuildMalformedWhereError` class. New `TestBuildUnknownFieldError` class (6 cases) covering generic, pluralization, and CONFUSABLE_FIELD_HINTS scenarios. Three new `TestWhereToFilters` cases verify the `unmapped` parameter (collected on request, deduplicated, walks nested AND/OR). 178 tests pass."], "migration": {"summary": "Behavior change for malformed DSL only: queries that referenced WHERE fields not in the subject's filter map previously silently degraded to an unfiltered list (capped at LIMIT 100) and returned `status: success`; they now return a structured error with `code: -32602`. Well-formed queries are unaffected. Clients should handle the new error shape \u2014 at minimum, log `data.unknown_fields` / `data.valid_fields`. Intelligent clients (LLM agents) can read `data.suggestion` + `data.example` and re-issue a corrected query.", "error_envelope": "{ \"error\": { \"code\": -32602, \"message\": \"Unknown WHERE field(s) for <subject>\", \"data\": { \"unknown_fields\": [...], \"valid_fields\": [...], \"subject\": \"<subject>\", \"suggestion\": \"<actionable hint>\", \"example\": \"<corrected query>\", \"help\": \"<recovery guidance>\" } } }", "additive_only_for_well_formed": "All well-formed queries return identical responses. No public response shape changes for the success path.", "rollback": "Revert `src/dsl/ast_executor.py` and `src/mcp/tool_definitions.py` to pre-0.10.2-beta to restore silent-drop behavior. Doc + comment changes are inert."}, "validated_on": "Local: 218/218 ast_parser tests pass (3 new where_to_filters cases + 6 new build_unknown_field_error cases + existing suite). Reproduction confirmed against prod aonav.ai pre-fix on 2026-04-23 with both Gemini 2.5 Flash (default `aoi-career-assistant` preset) and Gemini 2.5 Pro: both models issued queries with invalid WHERE fields (BADGE_EARLY_CAREER + POSTINGS_COUNT_QTILE / EARLY_CAREER_BADGE + POSTINGS_QTILE), backend logs showed 'Unmapped DSL field' WARNINGs, and Pro confidently reported '100' (the page cap) as the answer."}, {"version": "0.10.1-beta", "date": "2026-04-24", "type": "infra", "breaking": false, "summary": "Infrastructure cleanup (PR-B): removed legacy ChromaDB containers, volumes, environment variables, and the `scripts/sync-chromadb.sh` helper. The vector store now lives entirely inside RDS PostgreSQL via the `vector` extension (tables `title_pattern_vectors`, `industry_affinity_vectors`, `onet_cluster_vectors`). Replaced `src/seed_chromadb.py` workflow with a new standalone seeder `scripts/seed_pgvector.py` that talks directly to RDS PG and the Ollama embedding endpoint \u2014 no app-side imports, no SQLAlchemy, runnable from any host that can reach both. Added a `data-services` profile to `docker-compose.postgres-test.yml` so a fresh staging T3 can bring up its own `es-split` + `redis-split` (production already runs them via `docker-compose.split-api.yml`). API behavior, request/response shapes, auth, and routing are unchanged \u2014 this is a pure infra/ops PR.", "details": ["INFRA \u2014 `docker-compose.split-api.yml`: removed `chromadb-split` service, `chromadb_data_split` volume, and `CHROMADB_HOST`/`CHROMADB_PORT` env vars + dependency from `aoi-mcp-split-api`.", "INFRA \u2014 `docker-compose.unified.yml`: removed `chromadb` service, `chromadb_data_unified` volume, env vars, and dependency from `aoi-mcp-unified-server`.", "INFRA \u2014 `docker-compose.mac.yml`: removed `chromadb` service, `chromadb_data` volume, and env vars from `aoi-mcp-server`.", "INFRA \u2014 `docker-compose.split-spot.yml`: removed `CHROMADB_HOST`/`CHROMADB_PORT` env vars from `aoi-mcp-spot-api`.", "INFRA \u2014 `docker-compose.postgres-test.yml`: removed `CHROMADB_HOST`/`CHROMADB_PORT` env vars from `aoi-mcp-split-pg`. Added a `data-services` profile that brings up `redis-split` (port 6380\u21926379) and `es-split` (port 9200) with their own volumes, attached to the existing external `split-network`. Production keeps its existing data services from `docker-compose.split-api.yml`; staging now has a single self-contained command: `docker compose -f docker-compose.postgres-test.yml --env-file .env.deploy --profile data-services up -d --build`.", "SEEDER \u2014 new `scripts/seed_pgvector.py` (standalone). Uses `psycopg2` for PG and `httpx` for Ollama `/api/embed`. Subcommands: `--all` (full seed), `--title-patterns`, `--industry-affinity`, `--onet-clusters`, `--stats`, `--health`. Reads from `title_conversion`, `companies`, `industry_codes`, `occupation_cluster`, `onet_codes`. Upserts into `title_pattern_vectors`, `industry_affinity_vectors`, `onet_cluster_vectors`. Replaces `src/seed_chromadb.py` for all environments going forward.", "DEPLOY \u2014 `scripts/deploy-staged.sh`: removed `chromadb.status` health-subsystem check (no longer exposed by /health since the container is gone).", "DEPLOY \u2014 `scripts/deploy-with-checksums.sh`: removed `src/chromadb_client.py` and `src/seed_chromadb.py` from `DEPLOY_FILES`. Post-deploy reminder text now points at `scripts/seed_pgvector.py` and the pgvector tables.", "DEPLOY \u2014 `scripts/verify-instance-parity.sh`: removed the dedicated ChromaDB health-status comparison (pgvector liveness is implicit in `database.status`).", "OPS \u2014 deleted `scripts/sync-chromadb.sh` (superseded; copying ChromaDB tarballs between instances is no longer a thing).", "OPS \u2014 `scripts/test_split_instance.sh`: removed `chromadb-split` from the expected-running-containers list.", "OPS \u2014 `scripts/spot-split-launch.sh`: comment updated to clarify that `primary` mode no longer hosts a ChromaDB container \u2014 vectors are in RDS.", "DOCS \u2014 `.cursorrules`: rewrote the 'Elasticsearch + ChromaDB + Prefix Index Sync' section as 'Elasticsearch + pgvector + Prefix Index Sync' with new sync-table column ('pgvector Table' instead of 'ChromaDB Collection'); replaced 'To seed ChromaDB on EC2' with the new `scripts/seed_pgvector.py` workflow; renamed regression #13 to 'Forgetting to Seed pgvector' with updated commands; removed ChromaDB from container/port-mapping callouts.", "DOCS \u2014 `ops/OPERATOR_MANUAL.md`: \u00a77a renamed 'ChromaDB Seed/Sync' \u2192 'pgvector Seed/Sync'; full rewrite of When to Seed / How to Seed / Storage subsections; architecture diagrams and service tables updated; ChromaDB removed from the pinned-image list. The 2026-03-23 regression note is preserved as historical context (now framed as a same-shape risk for pgvector after a fresh DB restore).", "DOCS \u2014 `docs/deployment/ARCHITECTURE_AND_OPS.md`: architecture diagram replaces ChromaDB with pgvector-in-RDS; container tables for prod + staging drop `chromadb-split` and add an in-database vector-store row; \u00a77 'How Containers Are Started on T3' splits into prod (existing split-api stack) and staging (new `--profile data-services` one-liner); \u00a78d retitled and rewritten to describe the new seeding flow; \u00a79 'Data Services' replaces the ChromaDB block with a pgvector block; \u00a712b staging-rebuild runbook simplified to the new single compose command + pgvector seed; Should-Do item #12 marked DONE for this PR.", "VERSION \u2014 `version_config.json` and `docs/implementation-notes/VERSION.md` bumped to 0.10.1-beta with PR-B summary."], "migration": {"summary": "Operational migration only. No code changes for API consumers. Operators rebuilding staging from scratch should use the new `--profile data-services` command. After any RDS PG restore or bulk vector-source data refresh, run `python3 scripts/seed_pgvector.py --all` (see ops manual \u00a77a).", "production_action_required": "On the next production deploy, the legacy `chromadb-split` container can be `docker compose ... rm -sf chromadb-split` and its volume removed (`docker volume rm <project>_chromadb_data_split`). The API has not depended on it for live serving since the pgvector cutover; this PR removes it from the compose definition so it won't get re-created on rebuild.", "deferred": "Application code that conditionally imports `src/chromadb_client.py` (e.g. `src/disambiguation_service.py`, `src/vector_client_factory.py`, `src/seed_chromadb.py`, `tests/test_chromadb_client.py`) is intentionally left in place for this PR. A follow-up PR will delete the chromadb client + tests + seed script and remove the dual-backend branching in `vector_client_factory.py`."}, "validated_on": "All 5 docker-compose files validated with `docker compose config --quiet`. `grep -r 'chromadb\\|ChromaDB\\|CHROMADB' docker-compose*.yml` returns only explanatory comments. `grep` of scripts/ shows no ChromaDB code paths remain in the deploy/verify pipeline."}, {"version": "0.10.0-beta", "date": "2026-04-23", "type": "feature", "breaking": false, "summary": "Adjacency subjects: LIST SKILL_ADJACENT_CLUSTERS and LIST DESTINATION_CLUSTERS (DSL) plus mirror REST endpoints GET /api/clusters/{id}/skill-adj-clusters and /cluster-destinations. Lets a caller take a baseline cluster (or O*NET code) and ask 'what's skill-adjacent from here?' or 'what does this typically transition into?' filtered by company hiring footprint, job_level, ai_flag, or premium_skill. Companion side-fix: GET /api/occupations/{id}/pathways now additionally exposes destination_cluster_ids alongside the existing destination_clusters names array (additive, non-breaking). Also fixes a long-standing DSL parser bug where double-quoted values containing an apostrophe (e.g. WHERE JOB_LEVEL IS \"Bachelor's Degree\" or GET COMPANY \"Dick's Sporting Goods\") were silently truncated at the inner quote \u2014 value extraction is now quote-balanced across the IS, NOT, CONTAINS, ANY_BADGE, ALL_BADGES operators, the FOR clause, target extraction, the WHERE NAME IS shortcut, the SET clause, and the fallback condition parser. New HELP topic PATHWAYS sourced from docs/context/pathways.md.", "details": ["DSL \u2014 new subjects: LIST SKILL_ADJACENT_CLUSTERS FOR CLUSTER \"{id}\"|OCCUPATION \"{onet}\" and LIST DESTINATION_CLUSTERS FOR CLUSTER \"{id}\"|OCCUPATION \"{onet}\". Source columns: occupation_info.skill_adj_cluster_1/2/3 and cluster_id_destination_1/2/3.", "DSL \u2014 new filters: WHERE COMPANY IS \"X\" (intersect adjacency set with company hiring footprint, exact match), WHERE JOB_LEVEL IS \"X\", WHERE AI_FLAG IS \"X\", WHERE PREMIUM_SKILL CONTAINS \"X\".", "DSL \u2014 EXPAND comma-list now supported (e.g. EXPAND NAMES, COMPANY_HIRING). Previously only single-token EXPAND parsed. Targets for adjacency subjects: NAMES (cluster_name), OCCUPATIONS (occupations within each adjacent cluster), COMPANY_HIRING (postings_qtile + badges from company_occupation_summary; requires WHERE COMPANY IS). EXPAND METRICS reserved for F2 \u2014 returns 400 with explanatory message.", "REST \u2014 new endpoints: GET /api/clusters/{id}/skill-adj-clusters and GET /api/clusters/{id}/cluster-destinations. Use ?onet=<code> with {id}=0 to override path with an O*NET baseline. Mirrors all DSL filters and EXPAND modes.", "Side-fix \u2014 GET /api/occupations/{id}/pathways: response now includes destination_cluster_ids: [int|null, int|null, int|null] alongside the existing destination_clusters: [name|null, ...]. Additive \u2014 existing fields unchanged.", "HELP \u2014 new topic PATHWAYS sourced from docs/context/pathways.md (full What/Why/Syntax/Returns/Workflows/Errors/Constraints contract, including the pre-existing apostrophe limitation in DSL string literals).", "Cross-references \u2014 docs/context/clusters.md, occupations.md, and index.md now point to HELP PATHWAYS from a 'where can my skills go next?' example block.", "MCP \u2014 src/mcp/tool_definitions.py KEY PATTERNS gains 2 example lines (skill-adjacent at company, destinations from O*NET) and HELP PATHWAYS is registered in the topic block.", "DOCS \u2014 docs/api-reference/API_REFERENCE_V6_UNIFIED.md and ops/API_REFERENCE.md: new REST endpoint sections with parameter tables, examples, response shapes, and error contracts; new DSL syntax rows; complete worked examples in the cluster-DSL section.", "Web examples \u2014 web-ui-examples-dsl.json (+5 entries) and web-ui-examples-rest.json (+4 entries) under the CLUSTERS category.", "TESTS \u2014 tests/test_ast_parser.py: 6 new parser tests for adjacency (EXPAND comma-list with 2 and 3 targets; LIST SKILL_ADJACENT_CLUSTERS FOR CLUSTER; LIST DESTINATION_CLUSTERS FOR OCCUPATION; adjacency with company filter + EXPAND + LIMIT; PREMIUM_SKILL CONTAINS) plus 7 regression tests for the apostrophe fix (WHERE IS / WHERE CONTAINS with apostrophe; GET COMPANY \"Dick's Sporting Goods\"; COMPANYCONVERT \"L'Oreal\"; FOR OCCUPATION with apostrophe value; adjacency with apostrophe filter; reverse case \u2014 single-quoted value containing a double quote). All 141 parser tests pass.", "TESTS \u2014 tests/test_suite/test_pathways.py: 17 new functional/corner tests covering both DSL and REST surfaces, EXPAND modes, side-fix on /pathways, and error contracts (unknown cluster/onet, EXPAND COMPANY_HIRING without company, EXPAND METRICS reserved, unknown company on REST).", "BUGFIX \u2014 DSL parser apostrophe handling. The value-extraction regex was [\"'][^\"']+[\"'] \u2014 a character class containing BOTH quote characters, so a value like \"Bachelor's Degree\" terminated at the inner apostrophe and silently truncated to \"Bachelor\". Replaced with a quote-balanced subpattern (?:\"([^\"]*)\"|'([^']*)') that picks one quote to open and the same quote to close. Applied to: src/dsl_operators.py (IsOperator, ContainsOperator, NotOperator, AnyBadgeOperator, AllBadgesOperator), src/dsl/ast_parser.py (FOR clause value extraction, the WHERE NAME|COMPANY_NAME IS target shortcut, _parse_simple_condition fallback, _parse_set_clause). Convert and standard target extraction now route through the existing quote-balanced _extract_quoted helper. The dead-code src/dsl/generic_parser.py (slated for deletion per AST_MIGRATION_CLEANUP_MANIFEST.md) is unchanged.", "Permissions \u2014 no code change required. Existing GET:/api/clusters/* wildcard in middleware_strict.py covers both new routes; companies:read encompasses EXPAND COMPANY_HIRING (which only reads from company_occupation_summary, already authorized for any caller with cluster access)."], "migration": {"summary": "Pure additive feature. No existing surface changes shape or behavior. Clients that don't request the new subjects/endpoints/EXPAND modes are unaffected.", "additive_field_only": "GET /api/occupations/{id}/pathways response gains destination_cluster_ids array. Existing destination_clusters (names) array is unchanged.", "deferred": "EXPAND METRICS is reserved for the F2 release (richer adjacency metrics \u2014 promotion %, retention %, wage delta vs baseline). Currently returns 400 with explanatory error."}, "validated_on": "Local parser tests 134/134 pass. Functional/corner suite ready for staging deploy."}, {"version": "0.9.2-beta", "date": "2026-04-17", "type": "fix", "breaking": true, "breaking_scope": "within-beta-cycle", "summary": "RENAME (within-beta breaking, soft-deprecated for one release): the geographic-hiring filter introduced in 0.9.1-beta as HIRING_PERCENTILE / ?hiring_percentile= is renamed to HIRING_POSTINGS_QTILE / ?hiring_postings_qtile= to faithfully reflect the underlying database column hiring_flag.postings_count_qtile, which is a 1\u20133 quantile bucket \u2014 not a 0\u2013100 percentile. The 0.9.1-beta name is kept as a deprecated alias for one release and emits a deprecation_warning in the response envelope. Also corrects example INDUSTRY values from the non-existent 'Technology' to the actual industry name 'Software & Technology' in docs and the embedded web-test-ui reference.", "details": ["DSL: WHERE HIRING_PERCENTILE IS \"\u2026\" \u2192 WHERE HIRING_POSTINGS_QTILE IS \"\u2026\" (literal alias: HIRING_POSTINGS_COUNT_QTILE). Old keyword still parses and produces the correct result, but the response now includes a deprecation_warning field.", "REST: ?hiring_percentile= \u2192 ?hiring_postings_qtile= (literal alias: ?hiring_postings_count_qtile=). Old query param still accepted; response includes a deprecation_warning.", "Response envelope on cluster-companies endpoint gains optional 'deprecation_warning' object {deprecated, replacement, since, reason, removal} when the legacy keyword is used. New keyword returns the same envelope without the warning field.", "AST: cluster_field_map maps both HIRING_POSTINGS_QTILE and HIRING_POSTINGS_COUNT_QTILE \u2192 'hiring_postings_qtile'; HIRING_PERCENTILE \u2192 'hiring_percentile_deprecated' (separate key drives the warning emission).", "UDH: list_companies_for_cluster reads either filter key and routes to the same SQL filter on hiring_flag.postings_count_qtile; emits deprecation_warning iff filters['hiring_percentile_deprecated'] is set.", "REST endpoint: get_cluster_companies_rest reads ?hiring_postings_qtile= (and ?hiring_postings_count_qtile= alias) into the new key; ?hiring_percentile= still accepted but routed into the deprecated key.", "DOCS \u2014 External REST_API_REFERENCE_EXTERNAL.md: status block now flags the rename + soft-deprecation; parameter table renamed; examples migrated; old example kept under 'Deprecated 0.9.1-beta alias still works' comment.", "DOCS \u2014 Internal API_REFERENCE_V6_UNIFIED.md and ops/API_REFERENCE.md: parameter tables, DSL command tables, and example sections all updated with the new name and a callout for the deprecated alias; INDUSTRY example values corrected from 'Technology' (returns 0 rows in production) to 'Software & Technology'.", "DOCS \u2014 docs/context/clusters.md and docs/context/hiring.md: rename note added; constraints describe the alias; INDUSTRY examples corrected.", "Web examples \u2014 web-ui-examples-{dsl,rest}.json now use the new keyword/param; web-test-ui-simplified.html embedded API reference (both blocks) updated with new name, \u26a0 DEPRECATED callout for the old name, and corrected 'Software & Technology' industry; meta version bumped to v0.9.2-beta.", "MCP \u2014 src/mcp/tool_definitions.py example uses HIRING_POSTINGS_QTILE; tools_registry_unified.py supported_operations row updated.", "TESTS \u2014 tests/test_suite/test_clusters.py: existing hiring_percentile test renamed to hiring_postings_qtile; two new tests added for the deprecation path (DSL HIRING_PERCENTILE alias and REST ?hiring_percentile= alias) that assert deprecation_warning is present and names the correct replacement. Total: 11 cluster-companies functional tests."], "migration": {"summary": "Replace the 0.9.1-beta keyword/param with the 0.9.2-beta name. No semantic change \u2014 the same SQL filter on the same column with the same 1\u20133 value space.", "table": [{"surface": "DSL keyword", "from": "WHERE HIRING_PERCENTILE IS \"\u2026\"", "to": "WHERE HIRING_POSTINGS_QTILE IS \"\u2026\"", "literal_alias": "WHERE HIRING_POSTINGS_COUNT_QTILE IS \"\u2026\""}, {"surface": "REST query param", "from": "?hiring_percentile=\u2026", "to": "?hiring_postings_qtile=\u2026", "literal_alias": "?hiring_postings_count_qtile=\u2026"}], "value_space_unchanged": "\"1\" | \"2\" | \"3\" (quantile bucket from hiring_flag.postings_count_qtile)", "compatibility_window": "Old name continues to work for at least one minor release after 0.9.2-beta; clients using it will see a deprecation_warning object in the response envelope. Removal target: a later 0.9.x or 1.0 release; will be announced in CHANGELOG before removal.", "why": "The original 0.9.1-beta keyword called a 1\u20133 quantile bucket a 'percentile', which falsely implies a 0\u2013100 scale and is at odds with the underlying schema column hiring_flag.postings_count_qtile. The new keyword keeps fidelity with the schema and parallels the existing POSTINGS_QTILE keyword (same suffix \u2192 same kind of value; HIRING_ prefix \u2192 hiring_flag table scope vs cos)."}, "validated_on": "staging.aonav.ai 2026-04-17 \u2014 functional 109/112 (2 unrelated pre-existing search_companies failures, also failing on prod), 11/11 cluster-companies tests pass including DSL+REST deprecation-alias paths. aonav.ai 2026-04-17 \u2014 smoke 12/12, all 4 surfaces (new DSL kw, deprecated DSL alias, new REST param, deprecated REST alias) verified: new=no warning, deprecated=warning emitted, all return identical 47 rows."}, {"version": "0.9.1-beta", "date": "2026-04-17", "type": "feature", "breaking": false, "superseded_by": "0.9.2-beta (HIRING_PERCENTILE renamed \u2192 HIRING_POSTINGS_QTILE; old name kept as deprecated alias)", "summary": "LIST COMPANIES FOR CLUSTER gains WHERE CBSA / WHERE HIRING_PERCENTILE filters and ORDER BY {ALPHA|POSTINGS_QTILE|BADGE} server-side sort modes (with WITH SCORE diagnostic). Eliminates client-side sort for the two common visualization modes (top hirers by demand, top employers by badge) and prevents bespoke per-client sort work via a whitelist.", "details": ["DSL: LIST COMPANIES FOR CLUSTER \"<id>\" WHERE CBSA IS \"<cbsa_code|metro_name>\" \u2014 companies hiring for THIS cluster IN THIS metro (joins through hiring_flag, cluster-scoped). Numeric input \u2192 hiring_flag.cbsa (int); non-numeric \u2192 hiring_flag.cbsa_name equality (e.g. \"Boston-Cambridge-Newton, MA-NH\")", "DSL: LIST COMPANIES FOR CLUSTER \"<id>\" WHERE HIRING_PERCENTILE IS \"1|2|3\" \u2014 RENAMED in 0.9.2-beta to HIRING_POSTINGS_QTILE; old name kept as deprecated alias.", "DSL: LIST COMPANIES FOR CLUSTER \"<id>\" ORDER BY ALPHA|COMPANY_NAME|POSTINGS_QTILE|POSTINGS_COUNT_QTILE|BADGE [WITH SCORE] \u2014 server-side sort with deterministic multi-column tie-breakers for stable paging", "REST: GET /api/clusters/{id}/companies adds query params cbsa (numeric or metro name), hiring_percentile (RENAMED to hiring_postings_qtile in 0.9.2-beta), sort (alias order_by), include_score; envelope echoes resolved 'sort' field", "UDH: list_companies_for_cluster() gains dsl_options param; new _build_cluster_badge_score_sql() helper (3 cos.badge_* columns; same scoring_config weights as company-level _build_badge_score_sql); CLUSTER_COMPANIES_SORT_MODES whitelist returns -32602 enhanced error on unknown sort (REST maps to HTTP 400 with valid_sort_modes hint)", "PG-COMPATIBILITY (validation fix): badge_sort_score is aliased in SELECT whenever a non-alpha sort references it (PG requires SELECT DISTINCT + ORDER BY expressions to appear in select list); only surfaced to clients when WITH SCORE / include_score=true OR sort=badge", "AST: FOR CLUSTER branch in _exec_list_companies now passes dsl_options through (previously dropped silently); cluster_field_map adds CBSA, HIRING_PERCENTILE, POSTINGS_COUNT_QTILE alias", "DOCS: External REST_API_REFERENCE_EXTERNAL.md gains GET /api/clusters/{id}/companies section; internal API_REFERENCE_V6_UNIFIED.md and ops/API_REFERENCE.md expanded with new params, examples, and HTTP 400 error shape; docs/context/{clusters,hiring,index}.md cross-linked with sort/CBSA workflows; web-test-ui-simplified.html example dropdowns extended; embedded API Reference modal updated; MCP tool_definitions clusters[] examples include new patterns", "TESTS: 9 new functional cases in tests/test_suite/test_clusters.py \u2014 basic FOR CLUSTER, sort_postings (DESC assertion), sort_badge_with_score (DESC assertion + score presence), invalid_sort enhanced-error path, CBSA filter, HIRING_PERCENTILE filter, REST sort_postings, REST sort_badge_score, REST cbsa"], "migration": null, "validated_on": "staging.aonav.ai 2026-04-17 \u2014 smoke 12/12, functional 106/109 (2 unrelated search_companies failures, both also failing on prod), 9/9 new cluster tests pass"}, {"version": "0.9-beta", "date": "2026-04-15", "type": "feature", "breaking": false, "summary": "New REST endpoint GET /api/clusters/{id}/companies and DSL command LIST COMPANIES FOR CLUSTER \u2014 list companies hiring for a cluster with filters for postings_qtile, badge, industry", "details": ["REST: GET /api/clusters/{id}/companies with query params postings_qtile, badge, badge_early_career, badge_growth, badge_stability, industry, top10, limit, offset", "DSL: LIST COMPANIES FOR CLUSTER \"42\" [WHERE POSTINGS_QTILE IS \"5\"] [WHERE BADGE_GROWTH IS \"Platinum\"] [WHERE INDUSTRY IS \"Technology\"] [LIMIT N]", "UDH: New list_companies_for_cluster() method in unified_data_handler.py \u2014 queries company_occupation_summary by cluster_id with dynamic filters", "AST: New FOR CLUSTER branch in ast_executor.py _exec_list_companies() \u2014 no parser changes needed (generic FOR <subject> pattern)", "AUTH: Covered by existing GET:/api/clusters/* wildcard in middleware_strict.py", "DOCS: API reference, MCP tools registry, web test UI updated with examples"]}, {"version": "0.9-beta", "date": "2026-04-15", "type": "schema", "breaking": false, "summary": "Fix company_occupation_summary.cluster_id type from VARCHAR(20) to INT \u2014 aligns with all other tables, removes CAST workarounds, prevents reversion via import scripts or schema recreation", "details": ["SCHEMA: ALTER TABLE company_occupation_summary ALTER COLUMN cluster_id TYPE INT USING cluster_id::INTEGER \u2014 applied to staging RDS (aoi_data_staging), production RDS pending", "SCHEMA: sql/schema_postgres.sql and sql/00_init.sql updated to INT (prevents reversion on docker volume reset)", "CODE: Removed 3x db_dialect.cast_signed('cos.cluster_id') workarounds in unified_data_handler.py \u2014 joins now use native INT=INT", "CODE: Removed CAST(cos.cluster_id AS SIGNED) from config.py legacy SQL mapping", "CODE: Added int() cast to 4 cluster_id filter param bindings in unified_data_handler.py \u2014 asyncpg requires exact type match", "CODE: scripts/replace_3-2-26_data.py changed safe_str\u2192safe_int for cluster_id in import template \u2014 prevents future CSV imports from inserting strings", "API: cluster_id now returns as integer (42) instead of string ('42') in JSON responses \u2014 aligns with ES mapping and data dictionary", "VALIDATED: 12/12 smoke + 97/100 functional on staging (2 pre-existing search failures unrelated)"], "migration": "Run ALTER TABLE company_occupation_summary ALTER COLUMN cluster_id TYPE INT USING cluster_id::INTEGER on each PostgreSQL database. Pre-check: SELECT DISTINCT cluster_id FROM company_occupation_summary WHERE cluster_id ~ '[^0-9]' must return 0 rows."}, {"version": "6.6.12-alpha", "date": "2026-04-09", "type": "fix", "breaking": false, "summary": "Restore API call logging (4,500+ silent failures/day from Docker bind-mount permission issue) + admin dashboard token validation fix", "details": ["FIX: Docker bind-mount ./logs:/app/logs was owned by root \u2014 all api_calls.log writes failed silently since split-API migration (2026-03-20)", "FIX: Dockerfile chown on /app/logs at build time + pin appuser uid to 999", "FIX: admin-dashboard.html validates token with /auth/me instead of /health (public endpoint always returned 200)", "FIX: admin-dashboard.html apiCall() handles 401 explicitly, triggers re-login prompt", "FIX: src/web_ui_router.py injected apiBase corrected from '/admin/api' to '/admin'", "DOCS: ops/OPERATOR_MANUAL.md new sections for logging data flow, bind-mount regression, incident playbook"], "migration": null}, {"version": "6.6.11-alpha", "date": "2026-04-08", "type": "feature", "breaking": false, "summary": "Chat context benchmark tool + system prompt overhaul + benchmark case externalization", "details": ["NEW: scripts/benchmark-chat-context.py \u2014 20-case benchmark testing LLM tool-calling across 5 tiers (correct tool selection, ambiguous intent, HELP usage, query syntax, adversarial)", "NEW: scripts/benchmark-cases.json \u2014 externalized benchmark cases (editable without touching runner)", "FIX: Open WebUI system prompt rewritten to emphasize tool-calling over lecturing", "FIX: Context docs (hiring.md, occupations.md, schema.md) distinguish occupation-level postings_count_qtile (1-5) from geographic hiring (1-3)", "BASELINE: 96% / Grade A on Gemini 2.5 Flash"], "migration": null}, {"version": "6.6.10-alpha", "date": "2026-04-07", "type": "feature", "breaking": false, "summary": "Bedrock LLM as primary provider + G5 spot conversion + T3 staging instance + infrastructure docs overhaul", "details": ["INFRA: Bedrock Qwen3 32B as primary LLM provider (BedrockClient in unified_llm_service.py)", "INFRA: G5 on-demand stopped; AMI + spot launch template created for on-demand GPU experiments", "INFRA: New T3 staging instance at staging.aonav.ai with TLS + RDS aoi_data_staging (99/100 functional tests)", "FIX: nl_query_builder.py uses OLLAMA_HOST/OLLAMA_PORT env vars", "DATA: RDS data corrections applied (5 company renames, 2 description updates, 6 occupation removals, 272 orphan alias remaps)", "DOCS: ARCHITECTURE_AND_OPS.md full rewrite \u2014 architecture diagram, live inventory, credentials, runbooks", "COST: ~$875/mo savings (G5 on-demand stopped, Bedrock ~$16/mo)"], "migration": null}, {"version": "6.6.9-alpha", "date": "2026-04-03", "type": "feature", "breaking": false, "summary": "Expand COUNT filters to match LIST parity + Open WebUI Anthropic provider + middleware fix", "details": ["FEAT: count_companies aligned with list_companies filters (industry, industry_id, all badges with Ranked, event_type, company_uid, state/city/country, search)", "FEAT: count_occupations adds cluster_id, onet_code, job_level filters", "FEAT: Open WebUI Anthropic as third LLM provider slot", "FIX: middleware.py assigns tool_call_index when provider omits it", "DOCS: api-reference.txt refreshed to v6.6.6-alpha"], "migration": null}, {"version": "6.6.8-alpha", "date": "2026-04-03", "type": "refactor", "breaking": false, "summary": "Remove AST legacy code \u2014 3,153 lines deleted after 14-day soak (3,660+ DSL calls, 0 failures)", "details": ["REFACTOR: handlers.py 3,370\u2192217 lines \u2014 removed 560-line elif dispatch chain, all 33 _handle_* methods, DSLParser class, shadow mode infrastructure", "REFACTOR: Removed AST_ROUTING_ENABLED env var from 4 compose files (AST is now the only code path)", "VALIDATED: No behavioral change \u2014 AST executor handled 100% of production traffic since 2026-03-20"], "migration": null}, {"version": "6.6.7-alpha", "date": "2026-04-02", "type": "fix", "breaking": false, "summary": "Schema fixes \u2014 company_url standardization, ci_eq for PG case-insensitivity, datetime serialization, CLUSTERCONVERT ES routing, PostgreSQL migration docs", "details": ["FIX: company_URL\u2192company_url standardization across schemas, source code, and import scripts (backward-compatible JSON key preserved)", "FIX: ci_eq() helper in db_dialect.py for case-insensitive equality (LOWER() on PG, plain = on MySQL)", "FIX: datetime serialization in cluster REST endpoints (get_cluster_wages, retention, education) \u2014 was causing HTTP 500", "FEAT: CLUSTERCONVERT routed through Elasticsearch with MySQL FULLTEXT fallback \u2014 last command to move off native DB FTS", "FEAT: AST executor extracts JOB_LEVEL, AI_FLAG, SCORE from CLUSTERCONVERT WHERE clauses", "FIX: ast_parser zip code regex for GET LOCATION BY ZIP WHERE syntax", "FIX: vLLM health check timeout reduced to 0.5s (was 3s, caused p95 health >100ms)", "DOCS: PostgreSQL migration status doc, contextual cursor rule, post-merge resync procedure"], "migration": null}, {"version": "6.6.6-alpha", "date": "2026-04-01", "type": "data", "breaking": false, "summary": "Orphaned alias purge + remap: removed 1,481 dead alias rows (562 companies), remapped 49 alias groups to correct formal names (Google\u2192Alphabet, AMD\u2192Advanced Micro Devices, Twitter\u2192X, etc.). Zero orphans remaining.", "details": ["DATA: Purged 1,481 orphaned alias rows across 562 distinct company names from company_aliases on production and local", "DATA: Remapped 49 alias groups to correct formal company names (e.g. Google\u2192Alphabet, AMD\u2192Advanced Micro Devices, Schlumberger\u2192SLB Ltd., Twitter\u2192X, Kellogg\u2192Kellanova, PricewaterhouseCoopers\u2192PwC, United Parcel Service\u2192UPS, etc.)", "DATA: Before: 7,014 aliases / 2,297 distinct company names. After: 5,533 aliases / 1,702 distinct company names. Orphans: 0", "VERIFIED: google\u2192Alphabet(GE741), AMD\u2192Advanced Micro Devices(AD14), schlumberger\u2192SLB Ltd.(SR1414), twitter\u2192X(TR1654), kellogg\u2192Kellanova(KS927), PricewaterhouseCoopers\u2192PwC(PS1288), UPS\u2192UPS(US1664) \u2014 all in_database=True", "SEED: Removed Q\u00b2 Solutions entries from data/company_aliases_seed.csv (company not in DB)", "SEED: Fixed Google entries \u2014 Alphabet is the formal company name; Google is an alias for Alphabet, not vice versa", "SEED: Fixed Lowes \u2192 Lowe's (Lowe's is the formal company name)", "POLICY: New rule \u2014 no alias target company_name may exist in company_aliases unless it also exists in the companies table", "POLICY: Admin DSL and database tools must enforce alias-company sync on company add/rename/delete operations", "AUDIT: Full audit CSV at orphaned_aliases_audit.csv (611 orphans: 43 remap, 6 review, 562 remove)"], "migration": null}, {"version": "6.6.5-alpha", "date": "2026-03-25", "type": "data", "breaking": false, "summary": "Data changes: Cond\u00e9 Nast rename (accented canonical), Sandia Corporation \u2192 Sandia National Laboratories, Yelp 4 occupations removed", "details": ["DATA: Conde Nast Digital renamed to Cond\u00e9 Nast (accented PK) \u2014 144 title_conversion + 15 company_occupation_summary + 1 company + 3 aliases updated", "DATA: Sandia Corporation renamed to Sandia National Laboratories \u2014 288 title_conversion + 42 company_occupation_summary + 1 company + 3 aliases updated; former_name alias added", "DATA: Yelp \u2014 4 company-occupation links removed (Food Service Managers, Recycling Coordinators, Property/Real Estate Managers, Retail Sales & Service Clerks) \u2014 Yelp occupations 34 \u2192 30", "ACCENT: MySQL utf8mb4_0900_ai_ci makes unaccented queries match accented PK (Conde Nast \u2192 Cond\u00e9 Nast); ES asciifolding handles search; prefix index resolves via alias entries", "SYNC: ES reindexed, ChromaDB seeded, prefix index reloaded on production + staging + local", "VERIFIED: COMPANYCONVERT 'Conde Nast Digital' \u2192 Cond\u00e9 Nast (1.0), 'Sandia Corporation' \u2192 Sandia National Laboratories (1.0), Yelp occupations = 30", "LOG: docs/deployment/DATA_CHANGE_LOG_2026-03-25.md created as canonical record"], "migration": null}, {"version": "6.6.4-alpha", "date": "2026-03-24", "type": "fix", "breaking": false, "summary": "COMPANYCONVERT orphan alias handling \u2014 1,781 alias-only matches now return NOT_IN_DATABASE sentinel instead of phantom results with empty UIDs", "details": ["FIX: 1,781 of 7,011 aliases (25%) pointed to canonical company names not in the companies table (LLM-generated Dec 2025 import) \u2014 injected phantom results with empty company_uid", "NEW: company_uid='NOT_IN_DATABASE' sentinel on matches where alias resolved but no company record exists", "NEW: in_database boolean field on every COMPANYCONVERT match (true = has company record + UID, false = alias-only)", "NEW: Response-level 'message' field when zero valid matches: 'X is not in our company database. Try a different spelling, the full legal name, or a well-known abbreviation.'", "NEW: Response-level 'warning' field when some matches are alias-only: explains badge/occupation/wage data unavailable", "FIX: Guard in find_company_matches() tags orphans across all tiers (prefix, ES, MySQL) via single exit-point filter", "FIX: Specific orphaned aliases 'Epicc' and 'Epik' remapped from phantom 'Epic' to 'Epic Systems Corporation' (EC601)", "DOCS: API_REFERENCE_V6_UNIFIED.md and ops/API_REFERENCE.md updated with in_database field, NOT_IN_DATABASE sentinel, message/warning fields", "DOCS: api-reference.txt (deployed quick reference) synced 2026-03-26 \u2014 companyconvert payload, company not-found errors, LIST COMPANIES filters (company_uid, event_type, include_score, expand)", "DOCS: web-test-ui-simplified.html embedded API Reference modal updated to v6.6.5-alpha (same deltas; was stale v6.0.0)", "DOCS: Non-breaking change note added \u2014 existing clients unaffected"], "migration": null}, {"version": "6.6.3-alpha", "date": "2026-03-20", "type": "fix", "breaking": false, "summary": "Open WebUI MCP tool calling fixed \u2014 model presets with system prompt and native function calling; deploy smoke test added", "details": ["FIX: Open WebUI MCP tools were auto-injected and connecting successfully but models never used them \u2014 root cause: no system prompt and no function_calling:native setting on any model", "NEW: 'AOI Career Assistant' model preset (Gemini 2.5 Flash + system prompt + native FC + suggestion prompts) \u2014 tools work end-to-end", "NEW: 'AOI Career Assistant (Local GPU)' model preset (qwen3:30b-a3b + same config) \u2014 local GPU inference with tool calling", "NEW: openwebui/setup-models.sh \u2014 reproducible script to create model presets via Open WebUI API (not just in SQLite DB)", "NEW: scripts/smoke-test-chat.sh \u2014 post-deploy smoke test that verifies MCP tool calling works by sending a real chat message and checking for governed data in the response", "CONFIG: AOI Career Assistant set as default model for new conversations", "CONFIG: Admin password reset on EC2 (was unknown/undocumented)"], "migration": null}, {"version": "6.6.2-alpha", "date": "2026-03-20", "type": "feature", "breaking": false, "summary": "Manba keynote deck rebuilt \u2014 14-slide Steve Jobs arc with 2 live demos and backup section", "details": ["REWRITE: manba-pres.html completely restructured from 24 slides to 14 keynote slides + 5 backup developer-depth slides", "NEW: Act 1 (slides 1-5) \u2014 back-to-basics problem buildup: information delivery, not just LLM hallucination", "NEW: Act 2 (slides 6-8) \u2014 membrane architecture diagram + Demo 1: multi-surface query (NL/DSL/REST/CLI/MCP same query, same answer)", "NEW: Act 3 (slides 9-12) \u2014 product reveal, aonav.ai stats, Demo 2: live Open WebUI chat, roadmap", "NEW: Act 4 (slides 13-14) \u2014 'One more thing' distribution platform reveal + close bookend", "NEW: Backup slide engine \u2014 backup:true flag excludes slides from progress bar; accessible via ?backup URL param", "NEW: Multi-surface demo panel \u2014 runs NL translate + DSL + REST simultaneously, shows CLI/MCP representations", "REMOVED: Chat FAB button and overlay panel \u2014 chat now lives in Demo 2 slide as fullscreen panel", "DEPLOYED: EC2 container rebuilt and verified (nvidia runtime confirmed)"], "migration": null}, {"version": "6.6.1-alpha", "date": "2026-03-19", "type": "infrastructure", "breaking": false, "summary": "Open WebUI deployed on EC2 with nginx TLS proxy on :3443 and friendly /chat redirect", "details": ["DEPLOYED: Open WebUI (aoi-open-webui) running on EC2 \u2014 container port 8080 mapped to host port 3001 (localhost only)", "NEW: nginx server block on port 3443 with full TLS \u2014 proxies to Open WebUI with WebSocket and streaming support", "NEW: /chat redirect on main :443 server \u2014 https://aonav.ai/chat \u2192 301 \u2192 https://aonav.ai:3443/", "FIREWALL: Port 3443 allowed in UFW and AWS security group for external HTTPS access", "AUTH: Open WebUI auth enabled (WEBUI_AUTH=true), signup disabled \u2014 admin account: admin@aonav.ai", "DOCS: Updated ops/OPERATOR_MANUAL.md (quick ref, network diagram, services table, port map), openwebui/README.md (EC2 deployment section), openwebui/nginx/openwebui-proxy.conf (actual config), .cursorrules"], "migration": null}, {"version": "6.6.0-alpha", "date": "2026-03-18", "type": "feature", "breaking": false, "summary": "SQL-native badge sorting replaces Python-side sort; WITH SCORE diagnostic; AST routing enabled for all DSL commands", "details": ["FIX: ORDER BY PLATINUM/GOLD sorted only first 1000 alphabetical companies in Python \u2014 missed 11 of 22 all-platinum companies; replaced with SQL CASE WHEN expression that sorts the full dataset natively", "NEW: Diagnostic score visibility via DSL 'WITH SCORE' modifier and REST 'include_score=true' param \u2014 returns badge_sort_score in response for debugging sort order", "NEW: _build_badge_score_sql() generates weighted CASE WHEN expression for platinum-first or gold-first scoring (overall_badge 25/15 weight, sub-badges 15/10 weight)", "REMOVED: Python-side calculate_badge_score() function and use_python_sort code path \u2014 all sorting now handled by MySQL ORDER BY", "FIX: COUNT query extraction hardened to use string.find() for FROM/ORDER BY positions instead of fragile SELECT replacement (handles dynamic score column injection)", "FIX: AST executor _exec_get_wages() now handles GET WAGES FOR CLUSTER queries (was only handling occupation wages, missing cluster wage dispatch)", "ENABLED: AST_ROUTING_ENABLED=1 set in docker-compose \u2014 all 33 DSL commands now route through ASTParser + ASTExecutor instead of legacy elif chain", "VALIDATED: AST parity test 55/57 vs legacy 54/57 \u2014 zero regressions, AST fixes extra-whitespace edge case that legacy failed"], "migration": null}, {"version": "6.5.1-alpha", "date": "2026-03-17", "type": "fix", "breaking": false, "summary": "Fix COMPANYCONVERT ES score normalization and ranking regression \u2014 fuzzy matches suppressed, ordering wrong", "details": ["FIX: COMPANYCONVERT via Elasticsearch returned only the top-scored match; all other fuzzy matches were filtered out by min_score threshold", "ROOT CAUSE: Linear score normalization (score/max_score) in _find_matches_es() crushed non-top results \u2014 e.g. Starbucks scored 0.425 (filtered at 0.6) instead of passing", "FIX: Ranking now uses ES for recall (fuzzy/typo matching) with position/length re-ranking matching the tuned MySQL path behavior", "FIX: Clear string relationships (starts_with, contains, etc.) scored with position bonus + length penalty on normalized names; fuzzy-only matches (typos, aliases) use log-normalized ES scores", "RESULT: 'star' returns Starcom > Starbucks > Starr Companies (shorter names first, matching MySQL tuning); 'home' returns Home Depot before Home Instead", "PRESERVED: Typo tolerance (microsft\u2192Microsoft), alias matching (google\u2192Alphabet), abbreviations (MSFT\u2192Microsoft) all working correctly via ES fallback scoring", "FIX: Companies with special characters in names (A+E, AT&T, PG&E, H-E-B) were unsearchable \u2014 ES standard analyzer strips +/& causing spurious matches on individual letter tokens; non-position-matched results now scaled down when query contains special chars", "VERIFIED: 'A+E' now returns A+E Global Media at #1 (was missing); AT&T and PG&E exact matches preserved", "REGRESSION INTRODUCED: commit bfb8e4b (2025-12-01) when ES integration was added to company_matcher.py"], "migration": null}, {"version": "6.5.0-alpha", "date": "2026-03-14", "type": "infrastructure", "breaking": false, "summary": "AST-based DSL parsing infrastructure (shadow mode), bugfixes for IN operator, COUNT WHERE filters, and single-quote support", "details": ["NEW: AST parser (src/dsl/ast_parser.py) converts DSL strings into a typed QueryAST tree", "NEW: AST executor (src/dsl/ast_executor.py) walks QueryAST and dispatches to UnifiedDataHandler via a (command, subject) dispatch table", "NEW: AST data model (src/dsl/ast_nodes.py) with QueryAST, WhereNode, ConditionNode, Modifiers, ForClause dataclasses", "NEW: FilterBuilder (src/filter_builder.py) provides reusable SQL WHERE clause primitives with declarative filter handler registries", "NEW: GrammarGenerator (src/dsl/grammar_generator.py) generates DSL documentation and prompt context from aoi_data.yaml schema", "NEW: Shadow mode in handlers.py \u2014 every DSL query is parsed by both AST and legacy paths; AST results are logged but legacy path executes (zero behavior change)", "NEW: enable_ast_routing() / disable_ast_routing() methods for controlled per-command switchover", "FIX: _parse_single_condition() in dsl_operators.py stripped parentheses before matching, breaking IN (...) operator regex \u2014 now tries raw clause first, falls back to stripped", "FIX: COUNT COMPANIES WHERE and COUNT OCCUPATIONS WHERE ignored all filter conditions \u2014 handlers accessed legacy parsed format instead of structured conditions from parse_where_clause()", "FIX: IS, CONTAINS, and NOT operators in dsl_operators.py only accepted double-quoted values \u2014 now accept both single and double quotes", "TEST: 205 new tests (112 parser/filter, 81 shadow parity, 12 grammar generator) \u2014 all passing"], "migration": null}, {"version": "6.4.2-alpha", "date": "2026-03-10", "type": "fix", "breaking": false, "summary": "Fix NL translate docs (wrong field names) and prompt context (missing endpoints/commands)", "details": ["FIX: /api/translate docs used 'message' field name but code expects 'query' \u2014 all 3 API references corrected", "FIX: /api/translate docs used 'preferred_output' param but code expects 'preferred_type' \u2014 corrected in API refs and MCP tool example", "FIX: /api/translate response example had wrong field 'command' (should be 'query') and string confidence (should be float)", "FIX: LLM provider 'qwen' corrected to 'ollama'; default corrected from 'gemini' to 'auto'", "FIX: NL prompt context was missing company-specific REST endpoints: /companies/{name}/occupations, /badges, /clusters", "FIX: NL prompt context DSL examples starved by 20-example cap dominated by LIST COMPANIES (14 examples); added per-pattern cap of 4 with total cap of 35 to ensure all command categories represented", "FIX: LIST OCCUPATIONS FOR COMPANY and FOR OCCUPATION patterns added as top priority in DSL context", "FIX: build_system_prompt had 'Default to DSL' bias overriding preferred_type; now respects requested output type", "FIX: Legacy fallback prompt in nl_query_builder.py also missing company-specific endpoints and DSL commands", "Contractor guide: removed DSL section (UQ is REST-only), reworded 'direct API' disclosure", "Added parameter table and preferred_type explanation to all API reference docs", "Validated: 'show me top 10 occupations at Microsoft' now correctly produces LIST OCCUPATIONS FOR COMPANY 'Microsoft' LIMIT 10"], "migration": null}, {"version": "6.4.1-alpha", "date": "2026-03-03", "type": "feature", "breaking": false, "summary": "company_uid filter now functional on LIST COMPANIES (REST and DSL), with multi-value support", "details": ["REST: GET /api/companies?company_uid=AN85 (single) or ?company_uid=AN85,OL1194,TH1653 (comma-separated multi)", "DSL: LIST COMPANIES WHERE COMPANY_UID IS \"AN85\" (single value)", "DSL: LIST COMPANIES WHERE COMPANY_UID IN (\"AN85\", \"OL1194\", \"TH1653\") (multi-value)", "DSL: LIST COMPANIES WHERE COMPANY_UID IS \"AN85\" OR COMPANY_UID IS \"OL1194\" (OR accumulation)", "FIX: company_uid was previously parsed by DSL but silently dropped (no filter branch in unified handler)", "FIX: company_uid was documented in API reference but never read from REST query params", "FIX: DSL IN operator (InOperator) returned 'values' key but parse_where_clause expected 'value' \u2014 now handles both", "FIX: DSL handler now correctly passes list values from IN operator directly as filter lists", "FIX: Malformed WHERE clauses (e.g. missing IN keyword) no longer silently drop filters and return all rows \u2014 now returns structured error with syntax guidance", "FIX: All multi-UID examples updated from fictional UIDs (MC456, SB789) to real data (AN85, OL1194, TH1653 = Amazon + subsidiaries)", "Uses existing _add_eq_or_in helper: single UID generates = clause, list generates IN clause", "Indexed column (idx_company_uid) ensures fast lookups even for multi-value queries", "Enables hydrating relationship UIDs returned by EXPAND: callers can batch-fetch parent/subsidiary/sibling companies"], "migration": null}, {"version": "6.4.0-alpha", "date": "2026-03-02", "type": "feature", "breaking": true, "summary": "Company relationship fields now return UID arrays; EXPAND modifier resolves UIDs to company names", "details": ["BREAKING: aoi_parents, aoi_subsidiaries, aoi_siblings now return JSON arrays of UIDs (were semicolon-separated strings)", "Companies with no relationships return empty arrays [], not absent keys", "REST: ?expand=relationships adds related_companies block with resolved {company_uid, company_name} objects", "DSL: EXPAND RELATIONSHIPS modifier on any company command (GET COMPANY \"Amazon\" EXPAND RELATIONSHIPS, LIST COMPANIES WHERE ... EXPAND RELATIONSHIPS LIMIT N)", "DSL: Bare EXPAND also works (defaults to RELATIONSHIPS); DSL remains superset of REST expand param", "MCP: Updated tool description and examples for EXPAND RELATIONSHIPS", "Resolution uses company_uid_name lookup table (~1ms, 1,752 rows fully cached)", "Batch resolution: single query resolves all UIDs across all companies in a list response", "TESTED: 29/29 local tests (11 smoke + 18 regression), 4/4 EC2 smoke tests", "DEPLOYED: EC2 aonav.ai 2026-03-02, checksum-verified"], "migration": "Clients parsing aoi_parents/aoi_subsidiaries/aoi_siblings must switch from semicolon-string splitting to array iteration"}, {"version": "6.4.0-alpha", "date": "2026-03-02", "type": "fix", "breaking": false, "summary": "Company table cleanup: removed 107 orphaned rows, fixed import script to full-replace, created missing company_uid_name table", "details": ["FIX: Removed 107 orphaned companies from EC2 that survived the 2-24-26 UPSERT (not in new BGI CSV)", "FIX: import_2024_update.py now uses TRUNCATE + INSERT for companies table (was UPSERT which left old rows)", "FIX: Created missing company_uid_name lookup table on EC2 (migration 005 had not been applied)", "DATA: companies table now exactly 1,752 rows (matches CSV), 0 rows with NULL event_type", "DATA: company_uid_name table populated with 1,752 UID-to-name mappings from companies table", "VERIFY: company_occupation_summary confirmed correct at 54,819 rows"], "migration": null}, {"version": "6.4.0-alpha", "date": "2026-02-26", "type": "schema", "breaking": false, "summary": "2-24-26 BGI data update: corporate events, relationships, company descriptions/URLs/HQ, occupation summary refresh", "details": ["SCHEMA: 5 new columns on companies table (company_event_type, company_event_note, aoi_parents, aoi_subsidiaries, aoi_siblings)", "DATA: 1,752 companies full replace (descriptions, URLs, HQ city/state/country, events, relationships)", "DATA: company_occupation_summary full replace (54,819 rows from BGI 2.23.2026 export)", "EVENT TYPES: acquired, acquired_still_operating, merged, rebranded, split, chapter_11_bankruptcy, none", "REST: /api/companies?event_type=acquired (new filter parameter)", "DSL: LIST COMPANIES WHERE EVENT_TYPE IS \"merged\" (new filter)", "DSL: Combined filters work: INDUSTRY + EVENT_TYPE", "MCP: Updated tool descriptions and examples for event_type queries", "NL: Prompt templates updated with corporate events data model", "UI: 13 new example queries (REST + DSL) for event type filters", "FIX: DSL field_to_filter mapping now includes event_type, company_uid, industry_id in primary routing path"], "migration": "Run sql/004_add_company_event_and_relationships.sql, then scripts/import_2024_update.py inside container. Rebuild container for code changes."}, {"version": "6.4.0-alpha", "date": "2026-02-11", "type": "feature", "breaking": false, "summary": "MySQL is now primary auth backend for all code paths; YAML is fallback only", "details": ["Full YAML-to-MySQL migration completed (63 users, 100% fidelity) via scripts/migrate_yaml_to_mysql.py", "Auth middleware (middleware_strict.py) now resolves SSO users from MySQL first, YAML is fallback only", "Login handler supports username OR email via get_user_by_username_or_email()", "Full user profile (role, entry_points, data_apis) from MySQL junction tables via get_user_full_profile()", "Per-request access control (access_control.py) reads from MySQL junction tables with 5-min cache", "Login response now includes entry_points, data_apis, organization from MySQL", "Admin API reads from MySQL junction tables (auth_user_roles, auth_user_entry_points, auth_user_data_apis)", "Admin dashboard resolves permissions via /auth/me for SSO tokens (Auth0 tokens lack local perms)", "SQLAlchemy named parameters fixed in user_db_manager.py; explicit commits for write ops in database.py", "JWT secret unchanged; token refresh verified working", "Full battery test passed: REST + DSL + MCP \u00d7 4 users \u00d7 all data sets = 0 YAML fallbacks", "YAML dual-write still active in admin panel and manage_users.py (9 locations, search TODO(BETA))"], "migration": "Pull latest code and rebuild container. Do NOT copy local users.yaml to EC2. MySQL is authoritative for production users."}, {"version": "6.3.0-alpha", "date": "2026-02-10", "type": "fix", "breaking": false, "summary": "DSL HTTP endpoint returns 410/400 when handler returns error (e.g. command_removed)", "details": ["POST /api/dsl no longer wraps handler error in success: true; returns HTTP 410 for command_removed, 400 for other errors with success: false", "Deprecated analytics DSL commands (e.g. GET MOBILITY_PATTERNS FOR OCCUPATION) now yield 410 Gone and success: false instead of 200 with data"], "migration": null}, {"version": "6.3.0-alpha", "date": "2026-02-10", "type": "fix", "breaking": true, "summary": "Deprecated occupation analytics APIs disabled (410 Gone / command removed)", "details": ["REST: /api/occupations/{id}/analytics/wages, wages/progression, access, retention, mobility, transitions now return 410 Gone with alternative endpoint in body", "DSL: GET WAGE_PROGRESSION FOR OCCUPATION, GET ACCESS_REQUIREMENTS FOR OCCUPATION, GET RETENTION_DATA FOR OCCUPATION, GET MOBILITY_PATTERNS FOR OCCUPATION, GET CAREER_TRANSITIONS FOR OCCUPATION now return structured error (command_removed) with alternative", "Use GET WAGES FOR OCCUPATION, GET AI_IMPACT FOR OCCUPATION, GET PATHWAYS FOR OCCUPATION for real BGI data"], "migration": "Replace any use of occupation analytics REST or DSL commands with the alternatives documented in the 410/error response"}, {"version": "6.3.0-alpha", "date": "2026-02-10", "type": "feature", "breaking": false, "summary": "2-4-26 occupation_info fields in API payloads and filters", "details": ["LIST CLUSTERS / GET /api/clusters: response now includes job_level, skill_adj_cluster_1/2/3, premium_skill_1/2/3, common_clean_job_title_1/2/3", "GET CLUSTER / GET /api/clusters/{id}: same new fields in response", "Filters: job_level and premium_skill (or has_premium_skill) now applied for LIST CLUSTERS and GET /api/clusters", "GET /api/occupations: job_level filter added; response includes cluster_id, cluster_name, job_level when present", "GET /api/occupations/{id}/ai-impact: response adds job_level, premium_skills", "GET /api/occupations/{id}/education: response adds job_level", "GET /api/occupations/{id}/pathways: response adds skill_adjacent_clusters, premium_skills"], "migration": null}, {"version": "6.3.0-alpha", "date": "2026-02-07", "type": "feature", "breaking": false, "summary": "Automated AOI data set update process and bulletproof company import", "details": ["NEW: AOI Data Set data dictionary (docs/data/AOI_DATA_SET_DICTIONARY.md) \u2013 imported vs auto-generated fields", "NEW: AOI data update process (docs/data/AOI_DATA_UPDATE_PROCESS.md) \u2013 local \u2192 staging \u2192 production, validate/diff/approve/apply/verify", "NEW: scripts/aoi_company_import.py \u2013 validate CSV (schema, aliases), diff (add/delete/update), apply with optional --verify (SQL + API)", "NEW: Import/migration log (docs/data/IMPORT_MIGRATION_LOG.md) \u2013 repo log for data updates and impact on other hosts", "DATA: 2-7-26 companies update applied (local): 1763 rows updated from BGI Company_Data 2.6.2026 CSV", "Future: Same pattern for other named data sets (user-rights, log, company-aliases) with separate dictionaries"], "migration": "For production: copy data/2-7-26 data update to host, run aoi_company_import.py --apply --verify, then sync Elasticsearch companies index"}, {"version": "6.3.0-alpha", "date": "2026-02-07", "type": "fix", "breaking": false, "summary": "Fixes: MySQL user management, occupation queries, deploy safety", "details": ["FIX: MySQL user management \u2013 proper event loop and collation for user operations", "FIX: Remove non-existent columns from occupation queries (prevents runtime errors)", "DEPLOY: users.yaml removed from deploy script to protect EC2 production (no overwrite)"], "migration": null}, {"version": "6.3.0-alpha", "date": "2026-02-07", "type": "feature", "breaking": false, "summary": "Import testing: sample activities, --export-diff, regression on unchanged data", "details": ["NEW: AOI_DATA_UPDATE_PROCESS.md Section 2a \u2013 sample test activities, targeting the diff, regression on unchanged data", "NEW: scripts/aoi_company_import.py --export-diff FILE \u2013 write to_add/to_delete/to_update JSON for targeted tests", "Regression: pre-snapshot, fixed gold list, sanity on rest of DB"], "migration": null}, {"version": "6.3.0-alpha", "date": "2026-02-07", "type": "schema", "breaking": false, "summary": "2-4-26 BGI data migration confirmed (local and EC2)", "details": ["DATA: Companies (1,763), company_occupation_summary (55,151), occupation_info (861) from data/2-4-26 update", "DATA: 731 occupation_info rows with common_clean_job_title_1/2/3 populated for CLUSTERCONVERT", "Source: BGI export 2.2.2026 (Company_Data, Company_Occupation_Data, Occupation Info CSVs)", "Verification: Row counts and common_clean_job_title column present; new APIs and payloads reflect this data"], "migration": null}, {"version": "6.3.0-alpha", "date": "2026-02-05", "type": "feature", "breaking": false, "summary": "CLUSTERCONVERT: Find occupation clusters by common job titles", "details": ["NEW: CLUSTERCONVERT DSL command - find clusters by job title", "NEW: REST endpoint /api/clusters/convert?title=X&limit=N", "NEW: occupation_info.common_clean_job_title_1/2/3 columns", "NEW: FULLTEXT index ft_common_titles for fuzzy title search", "DATA: 731 clusters imported with common job titles", "MCP: Updated tools registry with CLUSTERCONVERT examples", "Use case: CLUSTERCONVERT finds clusters, TITLECONVERT finds O*NET codes"], "migration": "Run sql/migration_add_common_titles.sql, then scripts/import_occupation_info_2-4-26.py --apply"}, {"version": "6.2.0-alpha", "date": "2026-02-02", "type": "feature", "breaking": false, "summary": "vLLM integration with 16K context and anti-hallucination for EC2 (Phase 1)", "details": ["NEW: vLLM container on EC2 (port 8001) - Qwen2.5-14B-Instruct-AWQ", "NEW: 16K token context (--max-model-len 16384) for multi-turn agent with tool results", "NEW: AWQ Marlin quantization for faster inference (--quantization awq_marlin)", "NEW: vLLM primary in fallback chain: vLLM \u2192 Ollama \u2192 Gemini \u2192 OpenAI", "NEW: Test infrastructure for EC2/vLLM vs Mac/Ollama environment detection", "FIX: llm_provider_factory.py - vLLM URL path doubled /v1", "FIX: Anti-hallucination prompts - tool results now labeled [DATABASE RESULT]", "FIX: prompt_config.py - qwen/vllm mapped to large_models template", "PERF: Complex queries: 30-60s \u2192 5-15s", "ARCH: EC2-only deployment (requires NVIDIA GPU). Mac uses Ollama."], "migration": "EC2: Deploy docker-compose.unified.yml. Mac: No changes (uses Ollama)."}, {"version": "6.1.1-alpha", "date": "2026-02-02", "type": "fix", "breaking": false, "summary": "Admin Dashboard: Fixed Top Endpoints display and promoted enhanced UI", "details": ["FIX: Top Endpoints now displays correctly (was stuck on 'Loading...')", "FIX: Empty-data response now returns 'top_endpoints': [] instead of 'endpoints': {}", "FIX: Middleware now logs requests to in-memory stats (was only logging to file)", "ENHANCEMENT: Admin dashboard upgraded with real-time usage stats, sessions, API calls", "CLEANUP: Removed admin-usage-test.html (merged into admin-dashboard.html)"], "migration": null}, {"version": "6.1.0-alpha", "date": "2026-01-27", "type": "fix", "breaking": false, "summary": "Auth0 SSO users now get permissions in JWT token (Agent page fix)", "details": ["FIX: Auth0 SSO users now receive permissions array in JWT token", "FIX: Permissions derived from user role using ROLE_PERMISSIONS", "FIX: Agent/AI Agent chat pages now accessible to Auth0 analysts and admins", "ROOT CAUSE: Auth0 payload was missing 'permissions' field that legacy JWT included", "AFFECTED: All Auth0 SSO users who couldn't access /agent or /aoi-agent pages"], "migration": "Users must log out and log back in to receive new token with permissions"}, {"version": "6.1.0-alpha", "date": "2026-01-27", "type": "feature", "breaking": false, "summary": "ChromaDB added to health endpoint monitoring", "details": ["NEW: /health endpoint now includes ChromaDB status", "NEW: Reports connection status, host, and collection count", "FIX: Dockerfile.ollama-gpu now includes zstd for Ollama installation"], "migration": null}, {"version": "6.1.0-alpha", "date": "2026-01-20", "type": "data", "breaking": false, "summary": "1-15-26 BGI data update: 5 companies removed, common_titles deprecated", "details": ["DATA: Removed 5 companies (Credit Suisse, Forever 21, Joann's, Party City, Rite Aid)", "DATA: Company UIDs renumbered to fill gaps (consistent with BGI source)", "DATA: 114 company-occupation records removed (for deleted companies)", "DATA: common_clean_job_title_1/2/3 set to NULL (awaiting occupation_info update)", "API: common_titles field suppressed by default in company occupations", "API: REST include_titles=true to show deprecated field", "API: DSL INCLUDE TITLES modifier to show deprecated field"], "migration": "No action required - backward compatible"}, {"version": "6.1.0-alpha", "date": "2026-01-19", "type": "fix", "breaking": false, "summary": "Users added at runtime now work immediately (access control fix)", "details": ["FIX: Users added via admin panel or manage_users.py now have immediate API access", "FIX: Previously required container restart due to stale singleton cache", "NEW: force_reload() function clears access control cache after user changes", "Applies to: add, remove, modify, enable, disable user operations"], "migration": null}, {"version": "6.1.0-alpha", "date": "2026-01-19", "type": "feature", "breaking": false, "summary": "Bulk user import and manage_users.py enhancements", "details": ["NEW: manage_users.py bulk-import --csv file.csv (import users from CSV)", "NEW: manage_users.py list --format csv (export users to CSV format)", "NEW: --generate-passwords flag creates org-based passwords for new users", "NEW: --organization field stored per user", "Supports both local and EC2 container execution"], "migration": null}, {"version": "6.1.0-alpha", "date": "2026-01-18", "type": "feature", "breaking": false, "summary": "Unified LLM service with vLLM support and fallback chain", "details": ["NEW: UnifiedLLMService abstraction for all LLM interactions", "NEW: Configurable fallback chain: vLLM \u2192 Ollama \u2192 Gemini \u2192 OpenAI \u2192 Anthropic", "NEW: LLM_PROVIDER_TYPE and LLM_FALLBACK_CHAIN environment variables", "NEW: vLLM support for EC2 GPU (higher throughput than Ollama)", "NEW: qwen3:30b-a3b as EC2 production gold standard model", "Mac continues to use Ollama + qwen2.5:7b for local development"], "migration": "Optional: Configure LLM_FALLBACK_CHAIN for custom provider order"}, {"version": "6.1.0-alpha", "date": "2026-01-14", "type": "fix", "breaking": false, "summary": "Fix LIST CLUSTERS job_level and premium_skill filters", "details": ["FIX: LIST CLUSTERS WHERE JOB_LEVEL IS \"X\" now correctly filters by education level", "FIX: LIST CLUSTERS WHERE PREMIUM_SKILL CONTAINS \"X\" now correctly filters by skill", "FIX: SQL query now includes job_level, premium_skill_1/2/3, skill_adj_cluster_1/2/3 columns", "FIX: Response now returns all occupation_info fields for clusters", "Example: LIST CLUSTERS WHERE PREMIUM_SKILL CONTAINS \"Software\" now returns relevant clusters"], "migration": null}, {"version": "6.1.0-alpha", "date": "2026-01-14", "type": "docs", "breaking": false, "summary": "Comprehensive DSL documentation and Web UI example refresh", "details": ["DOCS: Complete DSL verbs reference (LIST, GET, COUNT, TITLECONVERT, COMPANYCONVERT, SEARCH, ANALYZE, VERSION)", "DOCS: Complete DSL subjects reference (COMPANIES, OCCUPATIONS, INDUSTRIES, CLUSTERS)", "DOCS: Complete DSL filters reference with types and examples", "DOCS: Complete DSL modifiers reference (LIMIT, OFFSET, ORDER BY, GROUP BY, INCLUDE, CLARIFY)", "DOCS: Added missing GET commands to api-reference.txt (WAGE_PROGRESSION, PROBABILITIES, ACCESS_REQUIREMENTS, RETENTION_DATA, MOBILITY_PATTERNS, CAREER_TRANSITIONS)", "DOCS: Added missing LIST commands (COMPANIES FOR OCCUPATION, OCCUPATION CLUSTERS, COMPANY CLUSTERS)", "UI: Refreshed web-ui-examples-dsl.json with 60+ DSL examples across 10 categories", "UI: Refreshed web-ui-examples-rest.json with streamlined REST examples", "UI: Removed emojis from example dropdowns for cleaner appearance", "UI: Example format now shows 'Title | query_preview' for clarity"], "migration": null}, {"version": "6.0.0-alpha", "date": "2026-01-13", "type": "feature", "breaking": false, "summary": "TITLECONVERT disambiguation with ChromaDB RAG and LLM inference", "details": ["NEW: 3-tier disambiguation system for ambiguous job titles", "NEW: ChromaDB integration for historical pattern learning (passive learning)", "NEW: Automatic LLM inference when Tier 2 confidence is below threshold", "NEW: industry, location, ambiguity_threshold, clarify API parameters", "NEW: agent_guidance response block with suggested follow-up queries", "NEW: disambiguation response block with tier2/llm recommendations", "NEW: aoi_title_convert MCP tool with context-aware matching", "NEW: DSL syntax: TITLECONVERT \"title\" WHERE COMPANY IS \"x\" AND INDUSTRY IS \"y\"", "IMPROVEMENT: 'style guru' now correctly resolves based on context", "IMPROVEMENT: Agents receive accuracy improvement hints (company=95%, industry=85%)", "CONFIG: ChromaDB service added to docker-compose files", "CONFIG: src/seed_chromadb.py for seeding industry affinity and O*NET clusters"], "migration": "Optional: Run seed_chromadb.py --industry-affinity to enable RAG disambiguation"}, {"version": "6.0.0-alpha", "date": "2026-01-13", "type": "feature", "breaking": false, "summary": "Added qwen3-embedding:8b as dedicated embedding model for RAG and disambiguation", "details": ["NEW: qwen3-embedding:8b (purpose-built embedding model) added to all workstations", "NEW: OLLAMA_EMBEDDING_MODEL environment variable for dedicated embeddings", "IMPROVEMENT: O*NET title matching accuracy improved from 40% to 100%", "IMPROVEMENT: O*NET category separation improved from 0.75-0.95 to 0.50-0.65 similarity", "FIX: 'hair stylist' now correctly matches 'Hairdressers' (was 'Fashion Designers')", "FIX: 'magazine editor' now correctly matches 'Editors' (was 'Fashion Designers')", "CONFIG: docker-compose files updated with OLLAMA_EMBEDDING_MODEL", "CONFIG: config/models.json updated with qwen3-embedding model definition", "DOCS: WORKSTATION_ALERT.md created with installation instructions"], "migration": "All workstations must run: ollama pull qwen3-embedding:8b"}, {"version": "6.0.0-alpha", "date": "2026-01-13", "type": "feature", "breaking": false, "summary": "TITLECONVERT now uses Elasticsearch for improved relevancy", "details": ["NEW: ES-powered TITLECONVERT with phrase matching, fuzzy search, synonyms", "NEW: Typo tolerance - 'stlye mangaer' now finds 'style manager' matches", "NEW: Synonym support - 'developer' finds 'engineer' entries", "NEW: Field boosting - O*NET title weighted 5x higher than raw titles", "NEW: title_conversion ES index (442,987 records)", "IMPROVEMENT: Exact phrase match 'style manager' now ranks #1 (was #7)", "FALLBACK: Automatic MySQL fallback if ES unavailable", "DEVOPS: deploy-with-checksums.sh now includes ES sync reminder"], "migration": "Run ES sync after title_conversion data changes: docker exec aoi-mcp-unified-server python3 /app/elasticsearch/sync_mysql_to_es.py"}, {"version": "6.0.0-alpha", "date": "2026-01-13", "type": "fix", "breaking": false, "summary": "TITLECONVERT scoring improvements: phrase matching, normalization, exact match bonus", "details": ["FIX: Scores now normalized to 0-1 range (was raw MySQL FULLTEXT 0-100+)", "FIX: Multi-word queries use phrase matching ('\"style manager\" style* manager*')", "FIX: Exact phrase matches get +0.2 score bonus", "FIX: Confidence thresholds now work correctly (high >= 0.8, medium >= 0.5)", "IMPROVEMENT: Exact matches no longer buried by high term-frequency partial matches", "Example: 'style manager' exact match moved from rank 7 to rank 3"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-23", "type": "feature", "breaking": false, "summary": "Add top10 and postings_qtile filters for company occupations", "details": ["NEW: GET /api/companies/{name}/occupations?top10=1 (top 10 most prevalent)", "NEW: GET /api/companies/{name}/occupations?postings_qtile=5 (high volume)", "NEW: DSL: LIST OCCUPATIONS FOR COMPANY 'X' WHERE TOP10 IS '1'", "NEW: DSL: LIST OCCUPATIONS FOR COMPANY 'X' WHERE POSTINGS_QTILE IS '5'", "Filters can be combined with badge filters"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-22", "type": "feature", "breaking": false, "summary": "Occupation clusters expanded with job_level, skill_adj_cluster, premium_skill fields", "details": ["NEW: GET /api/clusters returns job_level, skill_adj_cluster_1/2/3, premium_skill_1/2/3", "NEW: GET /api/clusters?job_level=X filter by education level", "NEW: GET /api/clusters?premium_skill=X filter by skill (partial match)", "NEW: DSL: LIST CLUSTERS WHERE JOB_LEVEL IS 'X'", "NEW: DSL: LIST CLUSTERS WHERE PREMIUM_SKILL CONTAINS 'X'", "Database: occupation_info table expanded with 7 new columns (861 rows updated)", "Data source: BGI Occupation Metrics (12.19.2025)"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-19", "type": "feature", "breaking": false, "summary": "Add group=onet parameter to titleconvert for autocomplete scenarios", "details": ["NEW: GET /api/titleconvert?title=X&group=onet (REST)", "Returns unique O*NET codes with companies array instead of duplicate rows", "Eliminates client-side deduplication for look-ahead/autocomplete", "DSL already supported: TITLECONVERT \"X\" GROUP BY ONET", "Response format: {onet_code, onet_title, companies:[], top_score, confidence}"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-19", "type": "feature", "breaking": false, "summary": "Add CLUSTER_ID filter to LIST OCCUPATIONS command", "details": ["NEW: LIST OCCUPATIONS WHERE CLUSTER_ID IS 187 (DSL)", "NEW: GET /api/occupations?cluster_id=187 (REST - already supported)", "Returns all O*NET codes belonging to a specific occupation cluster", "Updated: handlers.py, unified_data_handler.py, MCP tools, NL context, API docs"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-19", "type": "feature", "breaking": false, "summary": "AOIAPI-4: Add industry_id support for stable ID-based filtering (hash-based)", "details": ["NEW: /api/industries now returns industry_id (deterministic hash of industry name)", "NEW: /api/companies?industry_id=X filters by industry ID (hash reverse-lookup)", "Uses Option C: hash-based IDs - no separate table required", "industry_id is deterministic: same industry name always produces same ID", "industry_id takes precedence over industry parameter if both provided", "Jira: AOIAPI-4"], "migration": "Use industry_id for stable filtering; industry still works"}, {"version": "6.0.0-alpha", "date": "2025-12-18", "type": "feature", "breaking": false, "summary": "Add Anthropic Claude 3 Haiku integration for agent chat", "details": ["NEW: Claude 3 Haiku available as agent model", "Implemented native tool calling with proper tool_use/tool_result format", "All four agent models now functional: Qwen (local), Gemini, OpenAI, Claude"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-18", "type": "fix", "breaking": false, "summary": "AOIAPI-1: Fix overall_badge=Ranked filter in REST API", "details": ["FIX: REST /api/companies?overall_badge=Ranked now returns companies with any non-NA badge", "FIX: Was returning 0 results - now correctly returns 709 companies", "CLEANUP: Removed 140 lines of dead code (_handle_list_companies_where) from handlers.py", "ARCHITECTURE: Both REST and DSL now route through UnifiedDataHandler for LIST COMPANIES", "Jira: AOIAPI-1"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-17", "type": "feature", "breaking": false, "summary": "Badge 'Ranked' filter and unified SQL architecture", "details": ["NEW: badge_growth=Ranked, badge_stability=Ranked, badge_early_career=Ranked filters", "Ranked means 'any non-NA badge value' - filters for companies that have ANY badge (Platinum/Gold)", "FIX: industry= parameter now works in REST API (was only accepting sector=)", "FIX: Legacy sector= parameter now properly maps to industry filter", "ARCHITECTURE: REST and DSL now use unified SQL generation (Golden Rule compliance)", "Both REST and DSL produce identical results for same filters"], "migration": "Use industry= instead of sector= (sector= still works as alias)"}, {"version": "6.0.0-alpha", "date": "2025-12-17", "type": "feature", "breaking": false, "summary": "OpenAI GPT-4o-mini model support and multi-turn tool calling", "details": ["Added OpenAI GPT-4o-mini as selectable model in /agent interface", "Fixed multi-turn tool calling for OpenAI (proper message format)", "Updated Google AI Studio API key (higher rate limits)", "All three agent models now fully functional: Gemini, OpenAI, Qwen"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-17", "type": "fix", "breaking": false, "summary": "Local development database fix", "details": ["Fixed: docker-compose.mac.yml now uses aoi_data_v6 (was using old aoi_data)", "Added: cleanup-old-database.sh script for other dev workstations", "Removed: Shadow aoi_data database from both local and EC2"], "migration": "Run scripts/cleanup-old-database.sh on dev workstations"}, {"version": "6.0.0-alpha", "date": "2025-12-17", "type": "fix", "breaking": false, "summary": "Critical GPU acceleration fix for Ollama LLM", "details": ["Fixed: Ollama CUDA runner libraries (libggml-cuda.so) were missing from Docker image", "Fixed: Docker build now copies /usr/local/lib/ollama/ from builder stage", "Fixed: Removed CUDA_VISIBLE_DEVICES= empty string that blocked GPU discovery", "Fixed: Aligned LD_LIBRARY_PATH to include Ollama CUDA runner paths", "Result: NL translation now responds in 1-2 seconds (was 60+ second timeout on CPU)"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-16", "type": "feature", "breaking": false, "summary": "ORDER BY support for company listings", "details": ["REST: /api/companies now accepts order_by= parameter", "DSL: LIST COMPANIES now supports ORDER BY clause", "Supported values: alpha, alpha_desc (name_desc), platinum, gold, badge_count, industry", "Platinum/Gold sorting uses weighted badge scoring (overall 2.5x, other badges 1.5x)", "Badge sorting fetches up to 1000 companies then ranks by badge score"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-16", "type": "fix", "breaking": false, "summary": "Agent UX improvements and security hardening", "details": ["Agent: Added progress indicator with elapsed time for local model queries", "Agent: Added model status display showing if LLM is ready", "Agent: Improved error messages with recovery action buttons", "Security: Removed hardcoded credentials from agent page injection", "Fix: CSS static files now properly served from /static/"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-15", "type": "feature", "breaking": false, "summary": "MSA size and hiring filters for wage and hiring endpoints", "details": ["REST: /api/occupations/{id}/wages now accepts msa_size= and experience_level= filters", "REST: /api/clusters/{id}/wages now accepts msa_size= and experience_level= filters", "REST: /api/companies/{name}/hiring now accepts cluster_id= and cbsa= filters", "DSL: GET WAGES FOR OCCUPATION now supports AND MSA_SIZE IS and AND EXPERIENCE_LEVEL IS", "DSL: GET WAGES FOR CLUSTER now supports AND MSA_SIZE IS and AND EXPERIENCE_LEVEL IS", "DSL: GET HIRING FOR COMPANY now supports AND CLUSTER_ID IS and AND CBSA IS", "MSA_SIZE values: 'Large' or 'Small/Medium'", "EXPERIENCE_LEVEL values: '0', '5', or '10' (years)"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-12", "type": "feature", "breaking": false, "summary": "New filters for company-occupation queries", "details": ["REST: /api/companies/{name}/occupations now accepts onet_code= and cluster_id= filters", "REST: /api/companies/{name}/occupations now accepts badge_early_career=, badge_growth=, badge_stability= filters", "DSL: LIST OCCUPATIONS FOR COMPANY now supports WHERE ONET_CODE IS and WHERE CLUSTER_ID IS", "DSL: LIST COMPANIES FOR OCCUPATION now supports WHERE BADGE IS filter"], "migration": null}, {"version": "6.0.0-alpha", "date": "2025-12-11", "type": "feature", "breaking": false, "summary": "TITLECONVERT GROUP BY ONET support", "details": ["DSL: TITLECONVERT now supports GROUP BY ONET to return unique O*NET codes", "Syntax: TITLECONVERT \"job title\" LIMIT n GROUP BY ONET"], "migration": null}, {"version": "6.0.0", "date": "2025-12-10", "type": "schema", "breaking": true, "summary": "v6 schema migration - column renames", "details": ["SECTOR renamed to INDUSTRY (use primary_industry column)", "first_jobs_badge renamed to badge_early_career", "growth_jobs_badge renamed to badge_growth", "stability_jobs_badge renamed to badge_stability", "New LIST CLUSTERS command for occupation clusters"], "migration": "Update any queries using SECTOR to use INDUSTRY. Update badge column references."}, {"version": "5.3.0", "date": "2025-11-15", "type": "feature", "breaking": false, "summary": "Unified data handler - DSL parity with REST", "details": ["All REST endpoints now route through unified handler", "DSL supports all REST query capabilities"], "migration": null}], "upcoming": [{"version": "6.0.0-beta", "planned": "2026-Q1", "features": ["Stable REST API (no breaking changes after beta)", "Full MCP protocol support", "Enhanced NL query translation"]}], "deprecations": [{"feature": "Occupation analytics REST endpoints", "deprecated_in": "2025-01-07", "removed_in": "6.3.0-alpha", "replacement": "GET /api/occupations/{onet_code}/wages, /ai-impact, /pathways", "notes": "/api/occupations/{id}/analytics/wages, wages/progression, access, retention, mobility, transitions return 410 Gone"}, {"feature": "Occupation analytics DSL commands", "deprecated_in": "2025-01-07", "removed_in": "6.3.0-alpha", "replacement": "GET WAGES FOR OCCUPATION, GET AI_IMPACT FOR OCCUPATION, GET PATHWAYS FOR OCCUPATION", "notes": "GET WAGE_PROGRESSION/ACCESS_REQUIREMENTS/RETENTION_DATA/MOBILITY_PATTERNS/CAREER_TRANSITIONS FOR OCCUPATION return command_removed error"}, {"feature": "SECTOR field", "deprecated_in": "6.0.0", "removed_in": null, "replacement": "INDUSTRY (maps to primary_industry)", "notes": "SECTOR queries still work but return INDUSTRY data"}, {"feature": "OFFSET pagination", "deprecated_in": "5.2.0", "removed_in": null, "replacement": "RANGE pagination (e.g., RANGE 1-50)", "notes": "OFFSET still works but RANGE is preferred"}]}