Turning every number into one you can actually trust — and act on.
Analytics adoption stalls for one reason: people don't trust the number, so they re-pull it by hand. This suite attacks the trust gap directly — every answer earns its trust with a receipt and a consensus verdict, and it surfaces the problems a dashboard structurally cannot see.
The five moves that make trust the product
1 Validated by consensus, not a single oracle
Open / "why" questions are judged by a panel of different model families — so their blind spots don't overlap — reconciled by an arbiter, with a seat on a second machine. Agreement across families is the signal; a lone claim is flagged, not trusted.2 Data-driven, not argument-driven
Every answer ships a receipt: the lane it took, the exact gold tables + governed lineage (owner, logic version, freshness), and the steps. Nothing is "because the AI said so" — an auditor can replay it.3 360° observability
Smart alerts read every room, not the average. They fire on numbers that look fine — a flat metric hiding a real gain (Simpson's), a green metric hiding a real failure (festival-masked) — the alerts a topline dashboard structurally cannot raise.4 Self-remediation
When a metric goes Red, an agent auto-runs the RCA over the real gold data, names and quantifies the true driver, rules out the red herrings with evidence, drafts the business callout, and pushes it to any of four sinks — in-app, Telegram, email/digest, WBR deck.5 Evolve-as-you-go
Additive over the governance you already trust — no semantic layer to migrate to. New capability ships with receipts and a verify path by default; the deterministic demo path is preserved alongside every live feature; and the model factory lets the suite grow its own domain-tuned specialists. The system gets more trustworthy as it grows, not less.What the business asked for
From the Vision & Product Strategy: a trusted, scalable, intelligent analytics suite where every stakeholder can ask in natural language and get trusted, governed, consistent answers; where repeatable manual analysis is self-served; and where degradations are caught proactively. The strategy names seven problems:
| Problem statement (PRD) | How this suite answers it |
|---|---|
| 1 · Fragmented reporting & ops overhead — handcrafted pipelines, manual RAG-threshold alignment, person-dependent. | One governed metric mart + a GenAI alerting/RCA orchestration layer; targets & RAG defined once, propagated; reports auto-drafted to four sinks. |
| 2 · Data-quality & reliability — no contracts, reactive instrumentation, raw/journal data. | DQ alerts that learn population rates and flag proactively; business-rule outliers (e.g. "Refunded before Ordered"); self-recovering DQ gaps ruled out of business RCA. |
| 3 · Knowledge management — fragmented memory, no unified dictionary, SME-dependent. | A governed Knowledge & Skills repository (definitions, golden queries, dimensions, owners, lineage) on a RAG interface, with versioning and approval gates. |
| 4 · BI tooling fragmentation — Sheets / Looker / Plato, static, no conversational layer. | One conversational NLQ layer + governed mart as the single source of truth; multi-dim drill-downs and RCA on every number. |
| 5 · Advanced analytics & simulation scalability — offline, localized, hard to productionize. | A Workbench that composes governed cuts into saved boards (every cell a receipt); EDA/ML toolkits + no-code sandbox on the roadmap. |
| 6 · Experimentation & attribution — divergent logic, manual significance, confounders. | An Experiments Hub that checks sample-ratio mismatch first, then lift + significance + novelty-decay + Go/No-Go. |
| 7 · Data availability & landscape — raw/journal, query timeouts, no lineage in-line. | A browsable gold-layer map: every fact & metric with owner CDM, source, logic version, last-refreshed and click-through lineage; NRT for Tier-1. |
The architecture, in three layers
① Intelligence
The governed semantic core — Knowledge & Skills dictionary, the DuckDB gold metric mart, lineage & versioning. The trust the AI rides on: definitions are pulled, never hardcoded.② Agents
The reasoning layer — NLQ intent routing, the RCA issue-tree engine, and the consensus council (heterogeneous panel + arbiter + cross-machine seat). Where answers earn their receipt.③ Orchestration
The delivery layer — proactive Smart Alerts, the live NRT stream, the four reporting sinks, and the governance/report plane (single-machine today, split-to-two by one env flag).Delivered on four role-based surfaces over one backend (Business User · Analyst/SME · Leadership · Ops) with a role switcher — the self-serve Insights Portal (Path A) and the build-and-govern console (Path B) — the same 13 capabilities, RBAC-gated. The split is by role, not by frontend stack.
PRD vs Built — what the strategy asked for, what runs today
Every functional capability in the Product Strategy, mapped to what is working in the product right now. Status is drawn from the product's own live build log — honest about what is live, what is a by-design scaffold, and what we built beyond the PRD to make trust the product.
Primary capabilities — the eight requested in the strategy
| # | Capability (PRD) | What the strategy asked for | What's built in ana | Status |
|---|---|---|---|---|
| 1 | Knowledge & Skill repository | Single system of record for metric definitions, semantics, business context, facts lineage, golden-query code bank, dimensions dictionary — on a RAG interface, with lifecycle governance (versioning, approval, semantic-duplicate & backtest gates). | Governed dictionary on the RAG plane — /v1/define · /v1/search · /v1/lineage · /v1/versions over 29 metrics, with owners, logic versions, version history, lineage graph and a governed approval-to-registry workflow. The mart mirrors it for fast receipts. | Delivered |
| 2 | Metrics Mart & Data Cubes | Centralized source-of-truth for L0 metrics; key-value with 2–3 dimensional cuts; cubes for fast access; version control + approval; NL→SQL + validation agent. Fetch <3s (mart) / <15s (cubes). | Real SQL over a DuckDB gold layer — 29 governed metrics, multi-dimensional breakdowns, prior-period deltas, lineage receipt on every number. Fetch ~5 ms (vs the <3s target). Full governance write-path: propose in NL → NL→SQL draft → a Validation Agent dry-runs it on real gold → semantic-duplicate + criticality-tier gates → versioned commit + change-comms + audit + version timeline/diff. | Delivered |
| 3 | Smart Alerts & RCA governance | DQ alerts (dynamic thresholds, learned population rates, business-rule outliers, auto-Jira), business-performance alerts (centralized targets & RAG, audience & frequency control), RAG-driven RCA with slicing-and-dicing, auto-mailer synthesis, scheduling, email/chat. | The hero capability: proactive RAG-status alerts, each an auto-RCA over the real gold data that names & quantifies the true driver and rules out the red herrings with evidence; unified severity-ranked feed; live NRT re-evaluation; eight delivery channels (in-app, Telegram, email + weekly digest, WBR/MBR deck, Slack, Teams, WhatsApp, generic webhook). Festival-normalized & Simpson's-aware. | Delivered |
| 4 | Workbench | IDE-like environment, out-of-the-box data connections, scalable compute, code/skill repos, EDA + ML toolkits, deployment/hosting, interactive dashboards, no-code simulation sandbox. | Composes governed cuts into saved multi-metric boards, every cell with its own lineage receipt; an interactive cut builder, an EDA top-movers pass, save-as-skill to a reusable repo, and a no-code what-if sandbox (sliders → projected GMV through a transparent funnel on real gold). Heavy ML on Pod-local compute is the next build. | Delivered |
| 5 | Apps & Dashboards Store | NL dashboard/app builder, dashboard de-duplication, templates + one-click publish, instant connectivity, AI summaries & health scores, standardized WBR/MBR decks, hosting + discovery, mobile, lifecycle. | A live natural-language dashboard builder (prompt → real tiles over the governed mart), a template gallery, a de-dup nudge, an AI health score, one-click publish to a persisted store gallery (re-opens live), and the standardized WBR/MBR deck via the deck sink. Mobile delivered (responsive Path A); full lifecycle next. | Delivered |
| 6 | Data Availability & Landscape | Silver/gold queryable for all metrics, multi-domain discovery & join, cross-source DQ match, long-range trend without timeouts, NRT for Tier-1/BBD, lineage + last-refreshed + owner + logic version in-line. | A browsable map of the gold layer — every fact (rows, freshness, grain, dimensions) and all 29 metrics with owner CDM / source / logic version / last-refreshed; click any metric → its governed lineage + version changelog, live from the RAG. NRT stream for Tier-1. Cross-domain join discovery + an auto-onboarding feed now shipped. | Delivered |
| 7 | Experiments Hub | NRT self-serve A/B: standard report views, cross-domain joins, dynamic experiment filter, lift & normalized reporting, full stat tests (SRM, sample size, p-value, CI), Go/No-Go, annual impact, A/B repository, conversational canvas + RCA, offline cohort creation. | End-to-end A/B that checks sample-ratio mismatch first — a significant-looking lift is invalid if randomization is broken — then shows assignment split + SRM χ², lift + significance, novelty decay, and a Go/No-Go. A/B repository of results & learnings. Offline cohort→Audience-Manager, a natural-language experiment canvas (NL → power-sized, guard-railed design), and annualized impact + revenue-per-session on every read-out — now shipped. | Delivered |
| 8 | NLQ Interface | Fetch by NLQ, chat context, in-window viz + download, export to excel/gsheet/PPT, route to relevant apps/dashboards, set up alerts, run RCAs and return findings. | Conversational entry point with 9 deterministic intents (no model needed to pick the lane): fast lane <2s with a lineage receipt, self-serve cuts, the full RCA lanes, and a consensus lane for open questions — chat context carried, RCA findings returned inline. Office export — CSV / XLSX / a real PPTX deck / Google Sheets + deep-dive-app redirect now shipped. | Delivered |
Secondary capabilities — notable sub-requirements
| Sub-requirement (PRD) | In ana | Status |
|---|---|---|
| Facts lineage — upstream/downstream impact tracing | Lineage graph (depends-on / referenced-by, chase-able) live from the RAG | Delivered |
| Lifecycle: versioning, approval, change communication | Logic-version changelog (/v1/versions) + governed approve-to-registry; change comms via the sinks | Delivered |
| Lifecycle: semantic-duplicate & backtest/value-range gates | Governance gate flags ungoverned/semantic metrics before registry (proven on the conformance probe) | Delivered |
| Business-rule outliers (e.g. "Refunded before Ordered") | Rule-violation detection lane in the alert engine | Delivered |
| Auto-Jira ticket on data issue, priority by impact count | A governed ticket draft built from the RCA (priority by impacted-metric count) — "🎫 Draft Jira" on every RCA tile; filing needs a Jira connector | Delivered |
| Centralized targets & RAG, defined once, propagated | Governed thresholds per metric, applied across every view & alert | Delivered |
| RCA "slicing & dicing" on Red via an issue tree | The issue-tree RCA engine — the core of EC-1, EC-6, EC-8, EC-9 | Delivered |
| Automated mailer synthesis to stakeholders | One synthesis object → per-alert email + weekly digest + WBR deck + Telegram + in-app | Delivered |
| Standardized WBR/MBR decks with AI callouts | Deck sink generates a governed deck; callouts AI-drafted from the context repository | Delivered |
| NL dashboard builder + dashboard de-duplication | Live NL builder (prompt → tiles), template gallery, and a "reuse an existing dashboard" de-dup suggestion | Delivered |
| One-click absolute/relative lift toggle; metric synthesis | Normalized lift reporting in the Experiments Hub; ad-hoc metric synthesis via NLQ | Delivered |
| Code-Red real-time alerting on L0 drop during an experiment | A live NRT code-red: on the Big Billion Days stream, EXP-4471's new-checkout treatment arm collapses on payment-success while control holds → a real-time CODE-RED attributed to the experiment, with live impact + a reversible action (the intra-event, arm-specific drop the nightly batch can't catch). Honestly simulated; batch⇄live toggle. | Delivered |
| Office export (excel / gsheet / PPT); deep-dive-app redirect | One-click CSV · XLSX · a real downloadable PPTX deck · a Google Sheets live-push path on Metrics & Workbench; NLQ deep-dive router sends a question to the right app | Delivered |
| Cross-domain discovery & join without manual ETL; auto-onboarding | Join-discovery infers joinable facts from governed grain/dimensions; an auto-onboarding feed surfaces newly-instrumented metrics/dims — both on Data Landscape | Delivered |
| Dynamic threshold / RAG generation; business-rule outliers config | Data-derived RAG bands (mean±kσ vs governed) on Metrics; a business-rule DQ-check panel (real SQL checks + honest n/a) on Alerts | Delivered |
Non-functional requirements
| NFR (PRD) | Result | Status |
|---|---|---|
| NLQ "RCA"-type response < 2 seconds | Fast lane returns in ~5 ms–<2s (DuckDB over columnar Parquet, no Python loop) | Met |
| Deep-thinking queries may take longer with a clear message | "Latency-as-theatre" streaming — the deep council lane streams each verdict as it lands | Met |
| Standard multi-metric BI dashboards load < 15 seconds | Boards render from the same fast mart; representative load well under target | Met |
| Metric-mart fetch < 3s · data-cube fetch < 15s | ~5 ms mart fetch — exceeds target by orders of magnitude | Exceeded |
Beyond the PRD — what we added to make trust the product
The strategy asked for trusted analytics. To earn that trust we built capabilities the PRD never asked for — the differentiators:
| Capability | What it is | Status |
|---|---|---|
| Consensus council | A heterogeneous model panel (3 different families) + arbiter that judges open/"why" answers; agreement across families = reliable, lone claims flagged. The literal "validated by consensus." | Beyond PRD |
| Cross-machine, multi-family council | A fourth seat on a second machine (hub1), a multi-family registry (GLM · DeepSeek · Kimi · Claude) — consensus you cannot fake with one box or one model lineage. | Beyond PRD |
| Receipts on everything | Every answer carries its lane, method, gold lineage, steps and latency — the literal "data-driven, not argument-driven." | Beyond PRD |
| The edge-case answer key (EC-1…11) | Eleven planted, deterministic stories — including the two a dashboard cannot catch — so "AI intelligence" is provable, not asserted. | Beyond PRD |
| Live NRT lens + code-red | A Big Billion Days near-real-time lens (GMV/orders/payment-success per second) with a real-time experiment-attributed code-red; alerts re-evaluate on a push stream — alongside a byte-deterministic golden replay for demo safety. Honestly labelled simulated; batch gold stays source-of-truth. | Beyond PRD |
| Four role-based surfaces, one backend | Business User · Analyst/SME · Leadership · Ops over one RBAC-gated /v1 contract, with a role switcher — the self-serve Insights Portal (Path A) + the build-and-govern console (Path B). Compliance-grade RBAC stays outside the product by design. | Beyond PRD |
| Insights Portal + Orchestrator Agent | One conversational front door that classifies a question into a lane (rule-based/low-latency → Metrics Store · thinking/unbounded → App-Agent Action Bundle + cross-domain joins · alerts) and returns a routed answer with a receipt — the PRD-p6/p7 spine, made real. Saved answers become scheduled reports (cadence → channel). | Beyond PRD |
| Telemetry & adoption (#13) | Adoption · self-serve ratio · MTTR from real usage events instrumented across both surfaces — the capability that proves the suite is landing, not just shipping. RBAC-gated to leadership & ops. | Beyond PRD |
| Connector framework named to real systems | Snowflake, BigQuery, Databricks, S3, Kafka, Jira, ServiceNow, PagerDuty, Tableau — each tagged by category + status; manifest-driven, with honest stubs (real formatting, delivery gated on configured creds). | Beyond PRD |
| Dual deployment topology | Single-machine OR split-two-machine from one installer, reversible by an env flag — governance & council ship in every install. | Beyond PRD |
| Predictive SRE + conformance harness | A predictive observatory (crashloop risk before not-ready) and a 6-dimension contract/governance probe that keeps the API honest. | Beyond PRD |
| Model-factory ecosystem | A sibling fine-tuning studio that produces domain-tuned models the suite consumes — own the whole stack, scale trust without renting it. The "evolve-as-you-go" engine. | Beyond PRD |
| MCP server | The whole suite exposed as MCP tools (ask, get-metric, RCA, define, verify-with-council…) so Claude, Cursor, or any agent can call ana directly from an IDE — AI-native, in your editor. | Beyond PRD |
| Public API + SDK | The /v1 contract as a product: a key-gated, rate-limited REST gateway with OpenAPI/Swagger + a zero-dependency Python SDK. | Beyond PRD |
| Embeddable widgets | Drop-in, server-rendered widgets (KPI / trend / breakdown / alerts / ask) any app embeds with one tag — no API key in the page. | Beyond PRD |
| Agentic closed-loop actions | The Action Center: an alert recommends a fix → a human approves → the system acts → rollback, every step with a receipt + a durable audit ledger. The "self-remediation" promise, made real. | Beyond PRD |
| Self-expanding alert discovery | Discovery scans the gold layer for anomalies not in the curated catalog and proposes them (promote / dismiss) — the catalog grows itself. | Beyond PRD |
| Plugin / extension registry | Capabilities, sinks, council seats and data sources are all manifest-driven — add one by dropping a JSON file, no core edit. Visible on the Extensions page. | Beyond PRD |
Product craft — the build quality behind the demo
Beyond the capabilities, this build hardened the whole app into something demo- and YC-grade:
| Area | What we built |
|---|---|
| Visualization kit | A zero-dependency inline-SVG chart kit — KPI tiles, bars, donut/gauge, RAG-band trends, a platform×tier cube heatmap, a council radar, and a lineage node-graph — so every page is demo-grade and Path B stays all-Python / air-gappable. |
| Trust, everywhere | A "why trust this?" affordance on every metric, answer and tile — a chip that opens the provenance (gold source · owner · logic version · freshness) plus a confidence read. |
| Feels like a product | A ⌘K command palette (jump to any tool / run an action) and a first-run guided tour — both keyboard-driven. |
| Responsive + accessible | Off-canvas mobile sidebar; skip-to-content, focus-visible outlines, reduced-motion, aria labels, aria-live toasts. |
| States | A global loading bar + error/network toasts on every request, and standardized empty states. |
| Conformance-gated CI | A 6-dimension TestForge probe (contract · schema · errors · latency · governance · determinism) over all 10 backends, wired into GitHub Actions — currently 10/10 PASS. |
| Demo-readiness gate | Replay integrity (demo-critical endpoints byte-identical on repeat) + perf budgets verified: NLQ ~7 ms (<2s), mart ~5 ms (<3s), dashboards <240 ms (<15s). |
An annotated tour of the working app
One application, thirteen tools in the sidebar — all riding one governed data contract, reachable through four front doors (the role surfaces, an MCP server, a REST API + SDK, and embeddable widgets).













The capabilities
| # | Tool | What it's for | State |
|---|---|---|---|
| 0 | Insights Portal | The Orchestrator Agent front door — one question is classified into a lane + latency tier (rule-based lookup · thinking/unbounded · alerts) and dispatched; answers save as canned reports. | live |
| 8 | Ask (NLQ) | The conversational entry point — ask in plain English, get a trusted answer with a receipt. Routes itself across 9 deterministic intents. | live |
| 1 | Knowledge & Skills | The governed dictionary — definitions, golden queries, dimensions, owners, logic versions, lineage. The trust the AI rides on. | live |
| 2 | Metrics Mart & Cubes | The source of truth — governed L0 metrics with real SQL, multi-dim cuts, deltas, and a lineage receipt on every number. | live |
| 3 | Smart Alerts & RCA | The hero — proactive RAG-status alerts, auto-RCA with ruling-out evidence, live NRT, one-click council verdict, eight delivery channels. | live |
| 4 | Workbench | The analyst scratchpad — compose governed cuts into a saved board, every cell with its own receipt. | live |
| 5 | Apps & Dashboards | Build a dashboard in natural language → real tiles; template gallery, de-dup nudge, health score, publish, WBR/MBR decks. | live |
| 6 | Data Landscape | The browsable gold-layer map — every fact & metric with owner, source, logic version, freshness, click-through lineage. | live |
| 7 | Experiments Hub | A/B with a conscience — checks sample-ratio mismatch first, then lift, significance, novelty decay, Go/No-Go. | live |
| 9 | Action Center | Closed-loop remediation — an alert recommends a fix → approve → act → rollback, with receipts + an audit ledger. | live |
| 11 | Discovery | Self-expanding alerts — scans the gold layer for novel anomalies the curated catalog missed; promote or dismiss. | live |
| 10 | Extensions | The plugin registry — capabilities, sinks, council seats and data sources, all manifest-driven (add one with no code edit); a connector framework named to real systems (Snowflake, Databricks, Kafka, Jira, PagerDuty, Tableau…). | live |
| 13 | Telemetry & Adoption | The platform measures itself — adoption · self-serve ratio · MTTR from real usage events across both surfaces. RBAC-gated to leadership & ops. | live |
/v1 contract. ana isn't just an app; it's a platform.Four role surfaces onto one governed truth
The A/B split is by role, not by tech stack. One RBAC-gated backend serves four role-shaped surfaces; a role switcher moves between them. (Compliance-grade RBAC stays outside the product by design — the surfaces assume an upstream identity/access layer.)
Path A — Self-serve Insights Portal Business User · Leadership
The read-and-ask surface: the Orchestrator front door, a React chart kit at parity, an exec scorecard, the alert digest, and a mobile / off-canvas layout. Ask a question, get a trusted answer with a receipt — no build tools, no backend port in the browser.Path B — Build & Govern console Analyst/SME · Ops
The build-and-govern surface: the metric-development write-path (NL→SQL → Validation Agent → versioned commit), the Workbench builder, the diagnosis theatre, contract badge, pod health and Hub-Tools deep-links. All-Python, air-gappable.A role switcher moves a single user between surfaces; a Side-by-side view runs them at once to prove the contract is frontend-agnostic — one backend, four role surfaces, RBAC-gated.
The Red-Metric walkthrough — detect → diagnose → trust → act
One narrated path through the product, following a single failure from alarm to action. Each step tags the PRD requirement it satisfies. The data is deterministic and seeded, so the demo lands identically every time.
gateway_timeout spike on one PSP that cratered payment success.The narrated product demo
~4 minutes, captured live from the product with real recorded actions — clicks, scrolls, typing, and the what-if sliders — and a natural voice-over. A tour of every capability, then one use case end-to-end on the console (detect → diagnose → trust → act).
From this Beta-Doc to production
The trust mechanics are proven. The path to a Flipkart-scale rollout is incremental — additive over the governance you already trust, never a rip-and-replace.
1 Point it at real gold
Swap the seeded synthetic gold layer for FDP silver/gold tables and the governed metric dictionary. The contract is unchanged — the suite rides your existing governance.2 Wire the live signals
Connect the NRT stream to a real event source (BBD/Tier-1), turn on auto-Jira on DQ issues, and attach the four sinks to your real Telegram/email/deck distribution.3 Deepen the build-out
The NL dashboard builder, no-code what-if sandbox, office export (incl. real PPTX + Sheets), cross-domain join discovery and the governance write-path are live; next are heavy ML/EDA toolkits on Pod-local compute and the full app lifecycle.4 Grow the council
Add domain-tuned council seats from the model factory (a Flipkart-catalog / metric-vocabulary specialist), and a distinct-family arbiter — stronger consensus, owned in-house.100.124.4.38:9010 and Path B (build & govern console) at 100.124.4.38:9020; the living
product documentation is at /guide on both. Access is gated — ask for a time-boxed link.
Beta-Doc generated for AI Native Analytics · requirements from the Vision & Product Strategy · status from the product's live build log · the suite demonstrated on a deterministic, seeded, Flipkart-shaped data layer with Pod-hosted models. Trust is the product.