The standardised AI usage event
Every source — a LiteLLM webhook, a Langfuse trace, a Bedrock log, a Databricks usage row — is normalised to one canonical shape before anything else happens. It rides on Flexprice’s standard event (event_name, external_customer_id, properties, timestamp, source), with a fixed convention for AI properties.
| Field | Purpose |
|---|---|
provider, model, operation | Drive model pricing and let analytics break down by provider/model/token type. |
input_tokens, output_tokens, cached_tokens, reasoning_tokens | Token-level usage; unit/quantity are used for non-token sources (characters, seconds, requests, credits). |
reported_cost | The source’s own computed cost in USD, when it provides one (most gateways do). |
raw_user, raw_team, agent_id, tags | Source identifiers used to resolve the billing entity and hierarchy. |
fidelity | per_request or aggregate — records the granularity a source can offer, so analytics stays honest. |
Following Flexprice’s event format, all
properties values are sent as strings; Flexprice interprets numeric fields during aggregation.Connector types
Sources differ in whether they can push in real time, must be polled, or only expose aggregate totals. Flexprice covers all three with the same downstream pipeline.| Type | How it works | Best for | Latency |
|---|---|---|---|
| Push (edge) | The source POSTs each request to a Flexprice Collector running in your infra, which transforms and forwards it. | Real-time sources and data that must stay in-network. | Real-time |
| Managed pull | Flexprice runs a scheduled job that reads the source over API, SQL, or object storage, transforms server-side, and ingests. | API/SQL/log-reachable SaaS and clouds — no deployment. | Seconds to hours |
| Aggregate | Flexprice ingests periodic exports (dashboard/email/usage views) and converts native units to cost. | Sources with no per-request API. | Daily |
Push flow (e.g. LiteLLM)

Managed pull flow (e.g. Langfuse, Databricks, Bedrock)

request_id dedupe guarantees no double counting across overlapping pulls.
Integrating LiteLLM
LiteLLM is a real-time push source: it fires a logging callback on every completion, already carrying token counts, USD cost, model, and identity. There are three ways to wire it into Flexprice — from drop-in to fully custom — and all three emit the same standardisedai.usage event. You never define meters, features, or prices for this; enabling the LiteLLM connector with a template provisions them.
Option 1 — Flexprice SDK callback (recommended)
Register the Flexprice callback and you’re done — no payload mapping, no transform to maintain. Everycompletion() call you already make is metered.
ai.usage, and ships it — one event per request, attributed to the team (with the user kept as a child for per-user and per-agent visibility).
Option 2 — LiteLLM Proxy (no code)
If you run the LiteLLM Proxy, enable the Flexprice callback inconfig.yaml. Every request routed through the proxy — from any app, agent, or MCP server — is metered and attributed to the virtual key’s team/user, with nothing to deploy in your services.
Option 3 — Custom callback (full control)
When you want to shape the event yourself — add tags, override the entity, or filter calls — implement a LiteLLMCustomLogger and send the event with the Flexprice SDK. This is exactly what Option 1 does under the hood, exposed for you to customise.
The event Flexprice receives
Whichever option you pick, the same canonical event lands — already mapped, ready to be priced and attributed:Closing the loop — enforcement
Because Flexprice now holds the source of truth for spend, a budget breach can act back on LiteLLM (the dashed return path in the push-flow diagram). On awarning/critical alert, Flexprice calls LiteLLM’s management API with your master key to throttle the offending key or team:
How cost is attached
Most modern gateways compute USD cost themselves, so Flexprice uses that by default and re-prices only when it needs to.- Trust the source cost (default) — when
reported_costis present (LiteLLM, OpenRouter, Portkey, Langfuse, Helicone, Cloudflare, Vercel), Flexprice stores it as-is. This correctly captures negotiated and BYOK rates. - Re-price from the catalog — when a source provides tokens only (Bedrock logs, Databricks token usage) or you want markup/margin or a single normalised price across gateways, Flexprice prices from its model pricing repository.

Auto-provisioning metering
Enabling a connector and choosing a template creates the metering graph for you — no manual meters, features, or prices.cost_tracking— meters for input/output/cached/reasoning tokens and request count, features per dimension, and catalog pricing at cost. For internal showback.team_budget— the above plus a wallet, a recurring monthly credit grant, andinfo/warning/criticalalerts wired to gateway enforcement. For team and agent budgets.resale_markup— catalog re-pricing with a configurable margin, plus margin analytics. For AI features you bill customers for.
group_by on the provider and model properties, so there is no meter-per-model explosion — a handful of generic meters cover every model.
Identity and hierarchy resolution
Each connector maps source identifiers to a Flexprice billing entity and, optionally, builds the hierarchy:- A primary field (for example
raw_team) resolves to the customer being metered. - Additional fields (
raw_user,agent_id) create child entities under it, using Customer Hierarchy for individual visibility with consolidated rollups and shared wallets. - Unrecognised identifiers can auto-create entities or map to an existing
external_id.
Source coverage
Phase one deliberately spans every connector type so the model is proven against the hard cases:| Source | Type | Cost basis | Notes |
|---|---|---|---|
| LiteLLM | Push (or hosted webhook) | Source cost | Richest identity (key/user/team/org/tags); real-time. |
| Langfuse | Managed pull (observations API) | Source cost | No usage webhook upstream, so polled; near-real-time. |
| AWS Bedrock | Managed pull (S3 logs + CUR) | Catalog re-price | Per-request token logs; reconcile $ against Cost & Usage Report. |
| Databricks | Managed pull (system tables, SQL) | Catalog re-price | DBU/token usage joined to list prices; hourly grain. |
| Salesforce Agentforce | Aggregate | Catalog (rate card) | Native units (credits/conversations); fidelity: aggregate. |
Coverage expands continuously — Helicone, OpenRouter, Portkey, Snowflake Cortex, TrueFoundry, Cloudflare, Vercel, and SAP Joule follow the same connector model. Each new source is a connector definition plus pricing entries, not a change to your setup.
Related
AI Cost Tracking overview
The problem, the solution, and the high-level architecture.
Flexprice Collector
The Bento-based collector used for push and in-infra sources.
Event Ingestion
The underlying event pipeline AI usage rides on.
Alerts and Notifications
Spend thresholds, states, and webhook delivery.

