Semantic Drift Monitor

When API fields change meaning but not name — the $400K silent killer that no existing tool can detect

Highest Value Gap Requires LLM Agent Skill Opportunity

The Problem

Stripe upgraded their API from version 2024-12-18 to 2025-04-16. The payment_intent.amount field changed from presentment currency (EUR amount for EUR payments) to settlement currency (always USD).


The column name didn't change. The data type didn't change. The schema didn't change. Every existing monitoring tool — Fivetran alerts, dbt tests, Elementary anomaly detection — saw nothing wrong.


The dbt model had a CASE WHEN currency != 'usd' THEN amount * fx_rate that was now double-converting already-USD amounts. Result: $400K revenue underreported. The CFO found it, not the data team.

Evidence from Multi-Agent Simulation

9/10
Severity (all conditions)
0%
Severity reduction with existing tools
6/6
Conditions where tools failed
$400K
Simulated revenue discrepancy
Simulation Finding

Across all 3 tooling conditions (standard, dbt-tests, full observability), severity remained at 9/10. Even with Elementary anomaly detection, the engineer could see that something was wrong but not why. The root cause required understanding API changelog semantics — something no existing tool does.

Reddit / Survey Validation

"Numbers look wrong but nothing technically failed" — the #1 complaint pattern on r/dataengineering. Survey data: 2/3 of data engineers blame poor source data as root cause. MotherDuck blog: "'It's just a small schema change' strikes fear into every data engineer."

Why No Existing Tool Solves This

ToolWhat It DetectsWhy It Misses Semantic Drift
Fivetran AlertsSchema changes (column add/remove/rename)Schema didn't change — still called amount, still integer
dbt Testsnot_null, unique, accepted_valuesAmount is not null, is unique per payment, value is valid integer
ElementaryStatistical anomalies, freshnessDetects the anomaly but cannot explain WHY or connect to API changelog
Great ExpectationsDistribution checks, range validationCan flag unusual distribution but not map to upstream semantic change
Monte CarloVolume, freshness, schema, distributionSame as Elementary — symptom detection, not root cause

The gap: All existing tools operate at the data layer (values, schema, statistics). Semantic drift operates at the meaning layer (what a field represents). Bridging this requires understanding API documentation — which is a natural language task that requires LLM capability.

Product Design

How It Works

Fivetran sync detects API version change
Fetch API changelog
LLM parses semantic changes
Scan dbt models for affected SQL
Alert + fix suggestion

Example Output

⚠️  SEMANTIC DRIFT DETECTED

Source:   Stripe API version 2024-12-18 → 2025-04-16
Field:    payment_intent.amount
Change:   Now returns settlement currency (USD) instead of
          presentment currency (local currency)

Affected dbt models:
  - stg_payments (line 23): CASE WHEN currency != 'usd'
    THEN amount * fx_rate
    ⚠️  This will DOUBLE-CONVERT non-USD payments because
    amount is already in USD in the new API version.

  - fct_daily_revenue (depends on stg_payments)
  - fct_monthly_revenue (depends on stg_payments)
  - fct_customer_ltv (depends on stg_payments)

Suggested fix:
  Remove the FX conversion for the 'amount' field.
  Use amount directly as it is now always USD.
  OR use the new 'presentment_amount' field if you need
  local currency values.

Market Sizing

40K+
Companies using Stripe + dbt
100K+
Companies using any payment API + warehouse
4x/yr
Average API version updates (Stripe, HubSpot, Salesforce)

Target Customer

Competitive Landscape

CompetitorWhat They DoGap
Monte Carlo ($1.6B valuation)Data observability: anomaly detectionDetects symptom, not root cause. No API changelog parsing.
Elementary (open source)dbt-native observabilitySame gap — operates on data layer only
SodaData quality checksRule-based checks, no semantic understanding
This ProductAPI changelog → SQL impact analysisOnly product that bridges meaning layer to code layer

Business Model

Phase 1: Open Source Agent Skill (Month 1-3)

  • Free Claude Code / Codex skill published on marketplace + SkillsMP
  • Supports: Stripe, HubSpot, Salesforce changelogs
  • Goal: 1000+ installs, community feedback

Phase 2: Hosted Service (Month 3-6)

  • Continuous monitoring (not just on-demand)
  • Slack/PagerDuty integration for alerts
  • Multi-API support with custom changelog parsers
  • Pricing: $99/mo (startup) — $499/mo (growth) — $1999/mo (enterprise)

Phase 3: Platform (Month 6-12)

  • Semantic contract registry (like schema registry but for meaning)
  • Cross-team contract enforcement
  • Integration with dbt Cloud, Fivetran, Airbyte
  • Target: $50K-200K ARR Year 1

Why Now