Skip to main content
This page extends the architecture overview with the mechanics of ingestion: the standardised event every source maps to, the connector types, how cost is attached, and how metering is provisioned for you. The design goal is zero manual setup. You connect a source; Flexprice handles normalisation, pricing, and the creation of meters and features — you do not hand-write transforms, build a model price list, or define meters per model.

The standardised AI usage event

Every source — a LiteLLM webhook, a Langfuse trace, a Bedrock log, a Databricks usage row — is normalised to one canonical shape before anything else happens. It rides on Flexprice’s standard event (event_name, external_customer_id, properties, timestamp, source), with a fixed convention for AI properties.
{
  "event_name": "ai.usage",
  "external_customer_id": "team_platform",
  "timestamp": "2026-06-14T10:30:00Z",
  "source": "litellm",
  "properties": {
    "provider": "anthropic",
    "model": "claude-opus-4-8",
    "operation": "chat",
    "input_tokens": "1840",
    "output_tokens": "320",
    "cached_tokens": "1024",
    "reasoning_tokens": "0",
    "reported_cost": "0.041",
    "request_id": "req_abc123",
    "raw_user": "u_91",
    "raw_team": "team_platform",
    "agent_id": "agent_support_bot",
    "fidelity": "per_request"
  }
}
Key fields:
FieldPurpose
provider, model, operationDrive model pricing and let analytics break down by provider/model/token type.
input_tokens, output_tokens, cached_tokens, reasoning_tokensToken-level usage; unit/quantity are used for non-token sources (characters, seconds, requests, credits).
reported_costThe source’s own computed cost in USD, when it provides one (most gateways do).
raw_user, raw_team, agent_id, tagsSource identifiers used to resolve the billing entity and hierarchy.
fidelityper_request or aggregate — records the granularity a source can offer, so analytics stays honest.
Following Flexprice’s event format, all properties values are sent as strings; Flexprice interprets numeric fields during aggregation.

Connector types

Sources differ in whether they can push in real time, must be polled, or only expose aggregate totals. Flexprice covers all three with the same downstream pipeline.
TypeHow it worksBest forLatency
Push (edge)The source POSTs each request to a Flexprice Collector running in your infra, which transforms and forwards it.Real-time sources and data that must stay in-network.Real-time
Managed pullFlexprice runs a scheduled job that reads the source over API, SQL, or object storage, transforms server-side, and ingests.API/SQL/log-reachable SaaS and clouds — no deployment.Seconds to hours
AggregateFlexprice ingests periodic exports (dashboard/email/usage views) and converts native units to cost.Sources with no per-request API.Daily

Push flow (e.g. LiteLLM)

Push flow — LiteLLM pushes a per-request webhook to the Flexprice Collector, which transforms it to ai.usage and forwards it through metering and pricing to wallets, alerts, and analytics; a budget breach calls back to LiteLLM via the management API.
The collector is the existing Bento-based pipeline — inputs → processors → outputs — with a ready-made transform for the source, so you are not writing Bloblang by hand.

Managed pull flow (e.g. Langfuse, Databricks, Bedrock)

Managed pull flow — a scheduled trigger reads each source since the last watermark via API, SQL, or object-store readers, transforms the result to ai.usage server-side, dedupes by request_id, and ingests into Flexprice.
You provide read-only credentials once; Flexprice stores them securely and runs the schedule. A per-source watermark plus request_id dedupe guarantees no double counting across overlapping pulls.

Integrating LiteLLM

LiteLLM is a real-time push source: it fires a logging callback on every completion, already carrying token counts, USD cost, model, and identity. There are three ways to wire it into Flexprice — from drop-in to fully custom — and all three emit the same standardised ai.usage event. You never define meters, features, or prices for this; enabling the LiteLLM connector with a template provisions them. Register the Flexprice callback and you’re done — no payload mapping, no transform to maintain. Every completion() call you already make is metered.
import litellm
from flexprice.litellm import FlexpriceLogger

litellm.callbacks = [
    FlexpriceLogger(
        api_key="<FLEXPRICE_API_KEY>",
        # which LiteLLM identity becomes the Flexprice billing entity
        entity_from="metadata.user_api_key_team_id",
    )
]

# nothing else changes — usage now flows to Flexprice on every call
litellm.completion(
    model="anthropic/claude-opus-4-8",
    messages=[{"role": "user", "content": "Summarise this ticket"}],
    metadata={"user_api_key_team_id": "team_platform", "user_api_key_user_id": "u_91"},
)
The callback reads LiteLLM’s per-request record, maps it to ai.usage, and ships it — one event per request, attributed to the team (with the user kept as a child for per-user and per-agent visibility).

Option 2 — LiteLLM Proxy (no code)

If you run the LiteLLM Proxy, enable the Flexprice callback in config.yaml. Every request routed through the proxy — from any app, agent, or MCP server — is metered and attributed to the virtual key’s team/user, with nothing to deploy in your services.
litellm_settings:
  callbacks: ["flexprice"]

environment_variables:
  FLEXPRICE_API_KEY: "<FLEXPRICE_API_KEY>"
  # map a LiteLLM virtual-key dimension to the billing entity
  FLEXPRICE_ENTITY_FROM: "metadata.user_api_key_team_id"
When usage must stay inside your network, point LiteLLM’s logging webhook at your in-infra Flexprice Collector instead — the collector applies the same transform before anything leaves your VPC.

Option 3 — Custom callback (full control)

When you want to shape the event yourself — add tags, override the entity, or filter calls — implement a LiteLLM CustomLogger and send the event with the Flexprice SDK. This is exactly what Option 1 does under the hood, exposed for you to customise.
from litellm.integrations.custom_logger import CustomLogger
import litellm
from flexprice import Flexprice

fp = Flexprice(api_key="<FLEXPRICE_API_KEY>")

class FlexpriceLogger(CustomLogger):
    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
        # LiteLLM's normalised, per-request usage + cost record
        slp = kwargs["standard_logging_object"]
        meta = slp.get("metadata", {})

        fp.events.ingest(
            event_name="ai.usage",
            external_customer_id=meta.get("user_api_key_team_id", "unknown"),
            source="litellm",
            properties={
                "provider": slp["custom_llm_provider"],
                "model": slp["model"],
                "operation": "chat",
                "input_tokens": slp["prompt_tokens"],
                "output_tokens": slp["completion_tokens"],
                "reported_cost": slp["response_cost"],   # USD, computed by LiteLLM
                "request_id": slp["id"],
                "raw_user": meta.get("user_api_key_user_id"),
                "raw_team": meta.get("user_api_key_team_id"),
            },
        )

litellm.callbacks = [FlexpriceLogger()]

The event Flexprice receives

Whichever option you pick, the same canonical event lands — already mapped, ready to be priced and attributed:
{
  "event_name": "ai.usage",
  "external_customer_id": "team_platform",
  "source": "litellm",
  "properties": {
    "provider": "anthropic",
    "model": "claude-opus-4-8",
    "operation": "chat",
    "input_tokens": "1840",
    "output_tokens": "320",
    "reported_cost": "0.041",
    "request_id": "req_abc123",
    "raw_user": "u_91",
    "raw_team": "team_platform"
  }
}
If you’d rather not use an SDK at all, the same event can be posted directly to the events API:
curl -X POST https://api.cloud.flexprice.io/v1/events \
  -H 'x-api-key: <FLEXPRICE_API_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "event_name": "ai.usage",
    "external_customer_id": "team_platform",
    "source": "litellm",
    "properties": {
      "provider": "anthropic", "model": "claude-opus-4-8",
      "input_tokens": "1840", "output_tokens": "320",
      "reported_cost": "0.041", "request_id": "req_abc123"
    }
  }'

Closing the loop — enforcement

Because Flexprice now holds the source of truth for spend, a budget breach can act back on LiteLLM (the dashed return path in the push-flow diagram). On a warning/critical alert, Flexprice calls LiteLLM’s management API with your master key to throttle the offending key or team:
# soft: zero out the remaining budget on a key
curl -X POST https://<litellm-proxy>/key/update \
  -H 'Authorization: Bearer <LITELLM_MASTER_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{ "key": "<virtual-key>", "max_budget": 0 }'

# hard: block the key outright
curl -X POST https://<litellm-proxy>/key/block \
  -H 'Authorization: Bearer <LITELLM_MASTER_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{ "key": "<virtual-key>" }'
Start with soft alerts (webhooks/notifications) and graduate to hard enforcement once you trust the numbers.

How cost is attached

Most modern gateways compute USD cost themselves, so Flexprice uses that by default and re-prices only when it needs to.
  • Trust the source cost (default) — when reported_cost is present (LiteLLM, OpenRouter, Portkey, Langfuse, Helicone, Cloudflare, Vercel), Flexprice stores it as-is. This correctly captures negotiated and BYOK rates.
  • Re-price from the catalog — when a source provides tokens only (Bedrock logs, Databricks token usage) or you want markup/margin or a single normalised price across gateways, Flexprice prices from its model pricing repository.
The public model pricing repository is refreshed daily, so popular provider/model rates are available out of the box. Override any rate per provider, model, or customer for committed pricing, and apply a markup factor when reselling.
How cost is attached — if the ai.usage event already carries reported_cost, Flexprice uses the source cost; otherwise (or when markup or normalised pricing is needed) it prices from the model pricing repository and applies overrides and markup, producing the charged cost on usage.

Auto-provisioning metering

Enabling a connector and choosing a template creates the metering graph for you — no manual meters, features, or prices.
  • cost_tracking — meters for input/output/cached/reasoning tokens and request count, features per dimension, and catalog pricing at cost. For internal showback.
  • team_budget — the above plus a wallet, a recurring monthly credit grant, and info/warning/critical alerts wired to gateway enforcement. For team and agent budgets.
  • resale_markup — catalog re-pricing with a configurable margin, plus margin analytics. For AI features you bill customers for.
Provider and model breakdowns come from filters and group_by on the provider and model properties, so there is no meter-per-model explosion — a handful of generic meters cover every model.

Identity and hierarchy resolution

Each connector maps source identifiers to a Flexprice billing entity and, optionally, builds the hierarchy:
  • A primary field (for example raw_team) resolves to the customer being metered.
  • Additional fields (raw_user, agent_id) create child entities under it, using Customer Hierarchy for individual visibility with consolidated rollups and shared wallets.
  • Unrecognised identifiers can auto-create entities or map to an existing external_id.
This is what lets one wallet span an entire team while you still see each user’s and agent’s usage separately — and what powers per-agent ROI.

Source coverage

Phase one deliberately spans every connector type so the model is proven against the hard cases:
SourceTypeCost basisNotes
LiteLLMPush (or hosted webhook)Source costRichest identity (key/user/team/org/tags); real-time.
LangfuseManaged pull (observations API)Source costNo usage webhook upstream, so polled; near-real-time.
AWS BedrockManaged pull (S3 logs + CUR)Catalog re-pricePer-request token logs; reconcile $ against Cost & Usage Report.
DatabricksManaged pull (system tables, SQL)Catalog re-priceDBU/token usage joined to list prices; hourly grain.
Salesforce AgentforceAggregateCatalog (rate card)Native units (credits/conversations); fidelity: aggregate.
Coverage expands continuously — Helicone, OpenRouter, Portkey, Snowflake Cortex, TrueFoundry, Cloudflare, Vercel, and SAP Joule follow the same connector model. Each new source is a connector definition plus pricing entries, not a change to your setup.
Aggregate sources (such as Agentforce and SAP Joule) report coarse, native-unit totals rather than per-request tokens. Flexprice labels these with fidelity: aggregate so dashboards and alerts reflect the real granularity — treat per-agent ROI from these sources as approximate.

AI Cost Tracking overview

The problem, the solution, and the high-level architecture.

Flexprice Collector

The Bento-based collector used for push and in-infra sources.

Event Ingestion

The underlying event pipeline AI usage rides on.

Alerts and Notifications

Spend thresholds, states, and webhook delivery.