Problem statement
AI spend is sprawling and hard to attribute. A single company typically consumes models through many disconnected paths at once:- Gateways — LiteLLM, OpenRouter, Portkey
- Observability layers — Langfuse, Helicone
- Clouds and platforms — AWS Bedrock, Databricks, Snowflake Cortex
- Packaged SaaS AI — Salesforce Agentforce, SAP Joule
- Internal cost — what is each team, user, application, or agent spending on AI, and is it within budget?
- Customer cost — for the AI features you resell, how much does serving each customer actually cost?
- Margin — what is your AI margin by revenue once provider cost is subtracted from what you charge?
Solution
Flexprice acts as a single point of AI usage ingestion. Once usage flows in, everything else is built on top of the same metered data:- Feature usage at the customer level — every request is attributed to a billing entity (customer, team, user, or agent), so usage rolls up cleanly for showback, entitlements, and billing.
- Total AI cost and token-level visibility — break spend down by provider, model, and token type (input, output, cached, reasoning) for any entity and any time window.
- Start from base model pricing — Flexprice maintains a public model pricing repository, refreshed daily, so cost is computed out of the box for popular providers and models with no manual price setup.
- Or customise costing for negotiated pricing — override the base catalog with your committed or BYOK rates per provider, model, or customer, and apply markup when you resell.
- Set spend alerts and limits, and optionally fund wallets with preloaded credits — configure
info→warning→criticalthresholds on balances and usage, and give teams or customers a prepaid credit balance to draw down. See Alerts and Notifications. - Model your full org hierarchy — represent organizations, workspaces, teams, and users with both individual and consolidated limits and rollups, sharing wallets where you need to. See Customer Hierarchy.
AI Cost Tracking builds directly on Flexprice’s core metering primitives. Usage arrives through the same event ingestion pipeline, is measured with features and aggregations, and is governed with wallets and alerts — nothing new to learn if you already use Flexprice.
Architecture
At a high level, third-party sources send usage into Flexprice through one of two ingestion paths, and Flexprice turns that usage into cost, governance, and analytics.
- Flexprice Collector — a lightweight, Bento-based collector you can run in your own infrastructure. Best for real-time push sources (such as LiteLLM webhooks) and for data that must stay inside your network.
- Managed pull — Flexprice runs scheduled, credential-based pulls for sources reachable by API, SQL, or object storage (such as Langfuse, Databricks, Snowflake, and Bedrock logs). You paste credentials; there is nothing to deploy.
Communication back to your systems
Cost tracking is not only inbound. When a budget threshold is crossed, Flexprice can push signals back to your systems so limits are actually enforced:- Webhooks — every alert state change (
info→warning→critical) is delivered as a webhook, so your automation can react however you choose. See Alerts and Notifications. - Native enforcement at the gateway — for gateways that expose programmatic key controls, Flexprice can act on a breach directly:
- LiteLLM — using your LiteLLM master key, Flexprice can call the proxy’s management API to lower or zero a key/team budget, or block a key, for a true hard cutoff. (LiteLLM also emits its own budget alerts to Slack and webhooks independently.)
- OpenRouter — Flexprice can use the Provisioning API to set a per-key credit limit or disable a key.
- Portkey — budgets are enforced at the gateway but configured in the Portkey dashboard, so enforcement there is alert-driven rather than API-driven.
Ingesting AI Usage
The standardised event, connector types, and per-source mechanics.
Alerts and Notifications
Configure spend thresholds and webhook delivery.
Customer Hierarchy
Model orgs, teams, and users with shared and consolidated limits.
Wallets and Credits
Fund teams or customers with prepaid credit balances.

