Skip to main content
The Metrics Dashboard gives you a complete picture of your AI Gateway’s health and efficiency. Pinpoint performance bottlenecks, track spending in real-time, and see exactly how your application is being used — all from a single, interactive interface. The dashboard is organized into tabs, each providing a specific perspective on your data. Every tab supports time-range selection, filters (by model, user, virtual account, team, and more), and a Refresh button for live monitoring.

Overview

The Overview tab is the landing page of the Metrics Dashboard. It provides a single-screen summary of gateway activity across both LLM and MCP traffic.
Overview tab of the metrics dashboard showing total cost, LLM calls, MCP calls, error breakdowns, guardrails summary, and top usage leaderboards
  • Total Cost — aggregate spend across all models, with period-over-period comparison.
  • Total LLM Calls — count of all LLM API requests, with period-over-period change.
  • Total MCP Calls — count of all MCP requests, with period-over-period change.
Breaks down total request volume by API endpoint type. Each row shows the endpoint pattern, a friendly label, and its share of total traffic.
EndpointDescription
/mcp-serverMCP Gateway
/chat/completionsChat Completion
/agent/responsesAgent Response
/messagesAnthropic Messages
/proxyProxy
/responsesOpenAI Responses
/embeddingsEmbedding
/completionsCompletion
/v2/rerankRerank
Error Breakdown (Model) — Displays LLM provider errors grouped by HTTP status code or virtual account.Error Breakdown (MCP) — Same structure for MCP traffic, helping identify reliability issues with specific MCP servers.
Shows the total number of guardrail evaluations and a ranked list of which guardrail groups are triggering most often, with a breakdown of outcomes (blocked, flagged, mutated).
Ranked leaderboards for quick identification of the biggest consumers and most active components:
LeaderboardWhat it shows
Top ModelsMost-used models by request count
Top Model ProvidersMost-used providers (e.g. Google Vertex, OpenAI, AWS Bedrock)
Top Users by ModelUsers making the most LLM requests
Top Virtual Accounts by ModelVirtual accounts with the most LLM traffic
Top Users by MCPUsers making the most MCP requests
Top Virtual Accounts by MCPVirtual accounts with the most MCP traffic
Top MCP ServersMCP servers receiving the most requests
Top ToolsIndividual MCP tools called most frequently

Model Metrics

Deep visibility into LLM model performance, cost, and usage. Compare models, debug user issues, and track team spending.
Model Metrics tab showing requests per second, failure rates, latency percentiles, cost of inference, and token usage charts
Pivot all charts using the View by selector:
View byGroups metrics byWhen to use
ModelsModel name (default)Compare performance across different LLM models
Virtual ModelsVirtual model / model aliasEvaluate model routing configurations
UsersUsername of the callerDebug user-specific issues or track per-user consumption
Virtual AccountsVirtual accountMonitor usage by application or API key
TeamsTeam nameTrack costs per team for chargebacks or budget management
MetadataCustom metadata keys sent in request headersCreate custom views (e.g. by tenant, environment, or feature)
  • Total Input Tokens — total tokens sent to models.
  • Total Output Tokens — total tokens generated by models.
  • Total Count of Requests — number of LLM API calls.
  • Total Cost of Tokens — aggregate cost in USD.
  • Requests Per Second — throughput over time, broken down by the selected dimension.
  • Request Failure Rate — percentage of requests that failed over time.
  • Request Failures Breakdown — stacked bar chart showing failure distribution by error type.
  • Request Failure Rate By Error Type — failure rate broken down by HTTP status code (4xx, 5xx).
All latency charts support P50, P75, P90, and P99 percentile selectors.
  • Request Latency — end-to-end time from when the gateway receives a request until the complete response is returned.
  • Time To First Token (TTFT) — time until the first token is received. Critical for streaming use cases.
  • Inter Token Latency (ITL) — average time between consecutive tokens in a streaming response.
  • Time Per Output Token (TPOT) — average time to generate each output token.
  • Cost of Inference — cost over time, broken down by the selected dimension.
  • Input Tokens — input token volume over time.
  • Output Tokens — output token volume over time.

MCP Metrics

Tracks all Model Context Protocol (MCP) traffic flowing through the AI Gateway — server performance, tool-level metrics, and failure analysis.
The default view provides a server-centric overview of your MCP infrastructure.
MCP Metrics tab showing MCP server request rates, latency, failure rates, method call breakdowns, and error breakdown
Available charts:
  • Requests Per Second — throughput per MCP server over time.
  • Request Latency — latency with P50/P75/P90/P99 percentile selectors.
  • Request Failure Rate By Error Type — failures broken down by HTTP status code and RPC errors.
  • Request Failure Rate — overall failure rate per server over time.
  • Request Failures Breakdown — failures by server and error type.
  • MCP Method Calls Breakdown — which MCP methods (tools/list, tools/call, resources/list, etc.) are called most frequently.
  • Error Breakdown — errors grouped by error type and virtual account.
Drills down to the individual tool level across all MCP servers.
MCP Metrics tab showing tool-level request rates, latency, failure rates, and request count breakdowns
Available charts:
  • Requests Per Second — throughput per tool over time.
  • Request Latency — latency with P50/P75/P90/P99 selectors.
  • Request Failure Rate By Error Type — failures by error type for each tool.
  • Request Failure Rate — overall failure rate per tool.
  • Request Latency Summary — horizontal bar chart comparing latency distributions across tools.
  • Requests Count — tools ranked by total request count.
  • Requests Failures Breakdown — failures broken down by tool and error type.
View byGroups metrics byWhen to use
MCP ServersMCP server name (default)Identify underperforming or overloaded servers
ToolsIndividual tool nameFind slow or error-prone tools
UsersUsername of the callerTrack per-user MCP consumption
Virtual AccountsVirtual accountMonitor MCP usage by application or API key
TeamsTeam nameUnderstand MCP usage patterns across teams

Guardrail Metrics

Shows how your content safety and compliance guardrails are performing — evaluations, blocked and mutated requests, and latency impact.
Guardrail Metrics tab showing evaluated requests, blocked and mutated rates, guardrail results for input and output, and latency
  • Total Requests — number of requests evaluated by guardrails.
  • Total Mutated Requests — requests where a guardrail modified the content.
  • Total Flagged Requests — requests that were blocked by a guardrail.
  • Evaluated Requests per Second — rate of guardrail evaluations over time.
  • Requests per Second by Result — evaluations split by outcome: allowed, blocked, mutated, audit_mode_blocked.
  • Requests Blocked Rate by Guardrail — which guardrails are blocking the most traffic.
  • Requests Mutated Rate by Guardrail — which guardrails are mutating content most often.
  • Guardrail Results for Model Input — per-guardrail outcomes for prompts/user messages.
  • Guardrail Results for Model Output — per-guardrail outcomes for completions/responses.
  • Guardrail Latency Rate — latency overhead with P50/P75/P90/P99 percentiles.
  • Request Latency Distribution — latency comparison across guardrail groups.
View byGroups metrics byWhen to use
GuardrailsGuardrail group name (default)Compare effectiveness across guardrail policies
UsersUsername of the callerSee which users are triggering guardrails most
Virtual AccountsVirtual accountTrack guardrail activity by application
TeamsTeam nameUnderstand guardrail impact per team

Routing Metrics

Visibility into routing rules, rate limits, and budget limits — how they distribute traffic and enforce policies.
Routing Metrics tab showing routing rule usage, rate limit checks and exceeded rates, and budget limit checks and exceeded rates
  • Total Loadbalances — number of times load balancing was applied.
  • Model Calls Blocked By Rate Limit — requests rejected for exceeding a rate limit.
  • Model Calls Blocked By Budget Limit — requests rejected for exceeding a budget constraint.
  • Routing Rule Usage Rate — how often each routing rule is triggered over time.
  • Routing Failure Rate — routing-level failures where no valid target was found.
  • Routing Rule Target Model Breakdown — which target models each routing rule resolves to.
  • Rate Limit Checks Rate — rate of rate-limit evaluations over time.
  • Rate Limit Exceeded Rate — how often rate limits are being hit.
  • Rate Limit Result Breakdown — allowed vs. blocked counts per rate-limit rule.
  • Budget Limit Checks Rate — rate of budget-limit evaluations over time.
  • Budget Limit Exceeded Rate — how often budget limits are being hit.
  • Budget Limit Result Breakdown — allowed vs. blocked counts per budget rule.
View byGroups metrics byWhen to use
ConfigsConfiguration / rule name (default)See which configs, rate limits, and budget limits are being triggered
UsersUsername of the callerIdentify users hitting rate or budget limits
Virtual AccountsVirtual accountMonitor policy impact by application
TeamsTeam nameUnderstand policy impact per team

Cache Metrics

Measures the performance of your semantic cache — hit rates, cost savings, latency overhead, and errors.
Cache Metrics tab showing total requests, cache hit percentage, cost savings, cache errors, and latency added by cache lookups
  • Total Requests — number of requests that went through the cache lookup.
  • Total Cost Saved — dollar amount saved by serving responses from cache.
  • Cache Hit % — percentage of requests served from cache.
  • Total Requests — cache request volume over time.
  • Cache Hit Percentage — hit vs. miss rate over time. A high hit rate means the cache is working well; a low or declining hit rate may indicate queries are too diverse for the current cache configuration.
  • Cost Savings — dollar savings from cache hits over time.
  • Cache Errors — errors encountered during cache operations. Ideally this should show “No data”.
  • Latency Added Average — average latency overhead introduced by cache lookups.
View byGroups metrics byWhen to use
CacheCache type (default)Overall cache performance
Virtual AccountsVirtual accountCompare cache effectiveness across applications

Filtering and Drill-Down

The dashboard includes filters that allow you to narrow down your analysis to specific models, users, virtual accounts, teams, MCP servers, tools, or custom metadata fields. Filters persist across tabs, making it easy to investigate a specific user or model across all dimensions.
Model Metrics tab with filters applied showing filtered results for a specific user and model

Exporting Data

You can download aggregated metrics data in CSV format by clicking the export icon on supported tabs. Choose which dimensions to group the data by and optionally include custom metadata keys. You can also fetch the data via API for programmatic access.
Export aggregated data dialog showing grouping options and download button