AI API costs are unlike any other category of software spend. They accrue per token, per request, per model, across multiple providers – often simultaneously – and they scale directly with product usage in ways that are difficult to predict in advance. A single feature launch, a prompt engineering change, or an unexpected traffic spike can move the monthly bill significantly in either direction. Without purpose-built visibility, engineering teams are flying blind between billing cycles, and finance teams are reconciling costs that bear no obvious relationship to the infrastructure resources they are used to managing.
Opactiv brings the same discipline that FinOps practitioners apply to compute and storage infrastructure to AI API spend. It connects directly to the billing and usage APIs of leading AI providers, normalizes cost and token data into a consistent model, and applies continuous optimization intelligence – all within the same organizational framework used for the rest of your technology spend.
Opactiv connects directly to the usage and cost APIs of OpenAI, Anthropic, and xAI, pulling authoritative data rather than relying on exported invoices or manual reconciliation.
Opactiv integrates with the OpenAI organization-level usage and cost APIs, collecting data at the finest granularity OpenAI makes available. For every model, every project, every day:
- Input tokens - prompt tokens sent to the model
- Output tokens - completion tokens generated by the model
- Cached tokens - prompt cache hits, which carry a discounted rate and indicate prompt caching efficiency
- Request count - number of API calls per model per project per day
- Daily cost - USD cost as reported by OpenAI, attributed proportionally to models within each project by request volume
Data is tracked at the intersection of model name, project ID, and date, producing a resource record per unique combination. An organization running GPT-4o and GPT-4o-mini across three projects generates six distinct tracked cost streams, each independently visible and filterable.
Opactiv integrates with the Anthropic organization usage report API, capturing the same token-level granularity per model per workspace per day:
- Input tokens - prompt tokens
- Output tokens - completion tokens
- Cache read tokens - tokens served from Anthropic’s prompt cache
- Cache creation tokens - tokens written to the prompt cache on this request
- Request count - API calls per model per workspace per day
- Daily cost - broken down by cost type
Anthropic’s agentic features – web search tool use, code execution – generate costs that are not attributable to a specific model or token count. Opactiv captures these separately as distinct cost streams per workspace, so teams using Claude’s tool-use capabilities have full visibility into what those agentic operations cost, not just the token charges.
Opactiv integrates with the xAI billing API, collecting daily costs per model from the usage endpoint and token counts from invoice line items. Input tokens (prompt text) and output tokens (completion text) are tracked where available. All xAI costs are attributed at the model-and-day level.
AI API costs can be examined across several dimensions simultaneously – by model, by project or workspace, by provider, by date range, or any combination.
The default view surfaces costs and token usage per model, ranked by spend. This immediately answers the questions that matter most: which models are driving the bill, how their costs trend day by day, and what token volumes underlie each cost figure.
OpenAI organizes usage by project; Anthropic organizes by workspace. Opactiv respects these native groupings, enabling cost allocation by the internal business units or product teams that correspond to those projects. A single AI API account shared across multiple product areas becomes fully chargeable – costs flow to the right pool and the right owner in the organizational hierarchy.
Organisations running models from multiple providers simultaneously can view total AI API spend across all providers in aggregate, or break it down by provider to understand the relative contribution of each relationship to the overall bill.
All views are queryable across any date range up to a year, with daily granularity. Cost trends, token volume trends, and cache hit rates are all time-series data, not point-in-time snapshots – making it possible to see when spend accelerated, when a new model was introduced, or when a workload was retired.
Cloud infrastructure cost is driven by resource hours. AI API cost is driven by tokens. Opactiv tracks the token-level metrics that determine what you pay and why:
- Input tokens and output tokens are tracked separately because they carry different prices on every model on every provider. Understanding the input/output ratio for a given workload is the first step to understanding whether the cost per request is expected or anomalous.
- Cached tokens - available on both OpenAI and Anthropic - are tracked with a cache hit rate calculation: the proportion of input tokens served from the prompt cache. Prompt caching can reduce costs substantially for workloads with long repeated system prompts. The cache hit rate metric tells teams whether their caching strategy is working as intended or whether there is unrealized saving potential.
- Cost per thousand tokens is computed from actuals - observed cost divided by observed token volume - providing a real effective rate that accounts for pricing tier differences, batching discounts, and any negotiated terms, rather than relying on published list prices.
- Request count is tracked per model per day, enabling cost-per-request analysis alongside cost-per-token analysis. For use cases where request overhead (rather than token volume) drives cost, this metric is the relevant one.
AI API providers bill in US dollars. For organizations whose accounting currency is euros, pounds sterling, Australian dollars, or any other currency, the published USD figures do not match the figures that appear in financial systems.
Opactiv handles currency conversion automatically. At the time each day’s cost data is imported, the USD amount is converted to the organization’s accounting currency using the European Central Bank exchange rate for that date. The conversion is fully audited: each cost record retains the original source cost, the source currency, the exchange rate applied, and the date of that rate.
Rates are fetched from the ECB daily feed and cached, so historical imports carry the accurate rate for each specific day rather than a single retrospective conversion. When exchange rates are unavailable – due to a transient outage of the rate source – the system falls back to the most recent cached rate for continuity, and surfaces a clear error if no rate is available at all rather than silently applying an incorrect conversion.
The result is that AI API costs appear in dashboards, pool hierarchies, budget comparisons, and exports denominated in the same currency as the rest of the organization’s technology spend – ready for direct comparison and consolidated financial reporting without manual conversion.
Opactiv’s recommendation engine analyses AI API spend continuously and surfaces four categories of actionable optimization opportunity.
OpenAI and Anthropic both offer asynchronous batch processing endpoints that deliver approximately 50% cost reduction versus synchronous real-time API calls. The trade-off is latency: batch requests are fulfilled within a window rather than immediately. For workloads where near-instant response is not required – bulk document processing, overnight analysis pipelines, classification jobs, evaluation runs – this discount is directly capturable.
Opactiv identifies models that support batch processing and are currently being called synchronously, with sufficient consistent usage to confirm the workload is not inherently latency-sensitive. For each candidate, it calculates the projected monthly saving as 50% of the rolling average monthly cost.
The gap in capability between successive model generations has narrowed considerably while the gap in price has remained large. GPT-4o costs significantly more per token than GPT-4o-mini. Claude 3 Opus costs significantly more than Claude 3.5 Haiku. For many workloads, the larger model delivers no measurable improvement in output quality – it is simply the default choice that was made when the integration was built.
Opactiv maintains a pricing catalogue covering OpenAI, Anthropic, AWS Bedrock, and Google Vertex AI models, and maps each to its recommended lower-cost alternative. For resources with enough token history – at least seven days of data within the past thirty – it calculates the projected monthly saving as the difference in per-token cost applied to observed token volumes, broken down by input and output.
The saving calculation uses actual observed token volumes rather than a rough percentage estimate, so the figure reflects the real usage pattern of the specific workload rather than a generic benchmark.
Organisations that have reached multiple AI providers independently – direct OpenAI access plus Azure OpenAI, or direct Anthropic access alongside AWS Bedrock – may be paying different effective rates for the same model depending on which provider route each team chose. When the same underlying model is being accessed through more expensive and cheaper provider routes simultaneously, consolidating traffic onto the cheaper route captures the difference.
Opactiv maintains a cross-provider equivalence map covering models available through multiple providers: GPT-4o through OpenAI and Azure OpenAI, Claude 3.5 Sonnet through Anthropic and AWS Bedrock, Claude 3 Opus through the same pair, and several others. Where a model is being accessed through two or more providers with meaningful traffic on each, Opactiv calculates the saving from consolidating all traffic onto the cheaper option, using the actual token volumes observed on each provider.
An API key that is no longer actively used but remains valid is both a cost risk and a security risk. If the key is exposed – in a repository, a log file, or a compromised environment – it can generate charges at whatever rate the historical workload used it. Even a modest historical usage rate represents a meaningful exposure if reactivated unexpectedly.
Opactiv flags models associated with API keys that show a significant drop in daily cost compared to their historical baseline – specifically, keys that averaged meaningful spend in the 31–60 day window but have dropped to near-zero in the past 30 days. The projected exposure is calculated as what the historical usage rate would cost monthly if it were unexpectedly resumed. The recommended action is key rotation or revocation rather than a cost optimisation in the traditional sense.
Agentic AI workloads – where models call external tools, run code, or perform web searches as part of completing a task – generate costs that extend beyond token charges. A single agent invocation may trigger web search queries, code execution, or other billed operations that do not appear in token counts.
Opactiv captures Anthropic tool-use costs separately from token costs, categorised by tool type: web search, code execution, and others as they are introduced. These agentic cost streams appear as distinct line items in the AI usage dashboard, so teams building with Claude’s tool-use features can see what their agentic operations are actually costing – not just the conversation tokens that surround them.
This distinction matters as agentic workloads scale. Token costs grow linearly with conversation length; tool-use costs grow with the number and type of tool invocations. Understanding both components separately is essential to forecasting and optimizing the cost of agentic applications.
AI API spend does not exist in isolation. It is part of the total technology cost that organizations need to understand, allocate, and govern.
Opactiv integrates AI API cost data fully into the same organizational framework as cloud infrastructure spend. AI API resources are assigned to pools in the budget hierarchy, attributed to employee owners, governed by the same budget limits and anomaly detection policies, and included in the same cost analytics that cover compute, storage, and other spend categories.
A team that owns an AI-powered feature can see its total cost – the cloud infrastructure it runs on alongside the AI API calls it makes – in a single pool view. A finance stakeholder reviewing total technology spend sees AI API costs alongside cloud costs in the same currency, broken down by the same organizational dimensions.
Budget limits, constraint violations, and anomaly detection apply to AI API resources exactly as they apply to cloud resources. An AI workload that begins generating unexpectedly high costs triggers the same alerting mechanisms as a compute instance that has overrun its daily expense limit.
AI API usage patterns change faster than most other categories of technology spend. New models are released frequently. Features are built, tested, and sometimes abandoned on short cycles. Prompts are tuned in ways that shift token volumes significantly. Cost structures change as providers update their pricing.
Opactiv’s daily granularity and rolling analytics windows are designed for this pace. Yesterday’s costs are visible today. A token volume increase from a prompt change is visible within the next daily import. The pricing catalogue that underpins recommendations is maintained to reflect current model availability and pricing.
For engineering and finance teams managing AI API spend, Opactiv provides the visibility, accountability, and optimization intelligence needed to keep AI costs proportionate to the value they deliver.



