Truefoundry Docs

Virtual Models are reusable entities in TrueFoundry AI Gateway that allow you to create a single model interface that intelligently routes requests to one or more underlying models based on your configured routing strategy. Instead of directly calling individual models, you can use a Virtual Model that automatically handles load balancing, failover, and retries across multiple providers.

Why Use Virtual Models?

Virtual Models provide several key benefits:

Abstraction: Use a single model identifier instead of managing multiple provider-specific models
Reliability: Automatically route to healthy models when others fail or experience issues
Performance: Distribute traffic based on weights, latency, or priority to optimize response times
Flexibility: Easily switch between providers or adjust routing strategies without changing your application code

Creating a Virtual Model

Navigate to Virtual Models in AI Gateway

From the TrueFoundry dashboard, navigate to AI Gateway > Models and select Virtual Model.

Navigate to Virtual Models in AI Gateway

Virtual Models are organized into Virtual Model Provider Groups. When creating a new Virtual Model, you can either add to the already existing group or create a new one.

Create or Select a Virtual Model Provider Group and Set Access Controls

Give your Virtual Model Provider Group a unique name to organize your virtual models. The group name must be 3 to 64 characters long, alphanumeric with hyphens allowed, and cannot start with a number. Within a group, you can add multiple virtual models.Next, configure collaborators for the provider group to control who can access and manage your virtual models. You can assign:

User Role: Allows users/teams to use the virtual models for inference
Manager Role: Allows users/teams to modify the virtual model configuration

Create Virtual Model Provider Group and configure access controls

Learn more about access control here.

Configure Virtual Model Details, Routing Strategy, and Targets

For each virtual model in your group, configure the following:

Name: A unique identifier for your virtual model (e.g., gpt-4-production)
Model Types: Select the supported operation types (chat, completion, embedding, etc.)
You can select multiple model types if your virtual model needs to support different operation types.
Routing Strategy: Choose how requests should be distributed across your target models. The AI Gateway supports three main routing strategies:
Weight-based Routing
Distribute traffic based on assigned weights. For example, with weights of 80 for one model and 20 for another, roughly 80% of requests go to the first and 20% to the second.
Latency-based Routing
Route requests to the model with the lowest response latency. The gateway monitors response times (per output token) for each model and chooses the fastest healthy model.
Priority-based Routing
Route requests in priority order with automatic fallback. Requests go to the highest priority model first (0 is highest). If it fails, the gateway falls back to the next one.

For more on routing strategies, see the Load Balancing Overview. For configuration examples, check Commonly Used Routing Configurations.
Target Models: For your chosen routing strategy, specify one or more target models that will receive traffic. For each target, you can configure:
- Target: Select the model from the dropdown.
- Retry Configuration: Number of retry attempts, delay between retries, and status codes that trigger retries.
- Fallback Status Codes: HTTP status codes that should cause fallback to other targets.
- Fallback Candidate: Whether this target can act as a fallback for others.
- Override Parameters: (Optional) Set request parameters to override when routing to this target (e.g., temperature, max_tokens).

Configure virtual model details, routing strategy, and target models

Using Virtual Models

Once created, you can use virtual models just like any other model in the AI Gateway. The virtual model name follows the format: virtual-model-group-name/virtual-model-name.

Try Out Virtual Models in Playground

You can test your virtual models directly in the TrueFoundry Playground: Option 1: Click the try in playground button you see once you create the virtual model.

Try in playground button next to virtual model

Option 2: Go to the Playground directly and select your virtual model from the model dropdown

Select virtual model from model dropdown in playground

Select virtual model from playground dropdown

The playground allows you to interact with your virtual model and see how it routes requests to the underlying target models based on your configured routing strategy.

FAQ

Can I change the routing strategy after creating a virtual model?

Yes, you can update the routing configuration at any time. Changes take effect immediately for new requests. Existing in-flight requests will complete with their current routing.

How do I know which target model handled my request?

The gateway provides observability metrics that show which targets received traffic, their success rates, and latency. You can view these metrics in the AI Gateway dashboard.

Can I use virtual models with different model types?

Yes, you can configure a virtual model to support multiple model types (e.g., chat, completion, embedding). However, all target models must support the requested operation type.

What happens if all target models fail?

If all configured targets fail and exhaust their retry attempts, the request will fail with an error. Ensure you have sufficient fallback targets configured for critical use cases.

Can I use virtual models in routing configurations?

No, you can’t use virtual models as targets within other routing configurations. Only real (deployed) models can be the targets in routing rules.

What is the order of precedence between virtual model, header-based, and global routing configurations?

When routing a request, the AI Gateway determines which routing configuration to use in the following order of precedence:

Virtual Model Configuration: If no header-based override is present, the routing rules defined within the virtual model itself will be used.
Header-based routing: If you specify a routing configuration override in the request headers, this configuration will take the highest precedence for that request.
Global Configuration: If neither of the above exist for the model or request, the system-wide (global) routing settings are applied.

The typical order of precedence is: Virtual Model Configuration > Header-based routing > Global configuration.

Get Started

Developer Guide

MCP Registry and Gateway

Prompt Management

Observability

Integrations

Deployment

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

Virtual Models

Why Use Virtual Models?

Creating a Virtual Model

Using Virtual Models

Try Out Virtual Models in Playground

FAQ

Get Started

Developer Guide

MCP Registry and Gateway

Prompt Management

Observability

Integrations

Deployment

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

​Why Use Virtual Models?

​Creating a Virtual Model

​Using Virtual Models

​Try Out Virtual Models in Playground

​FAQ

Why Use Virtual Models?

Creating a Virtual Model

Using Virtual Models

Try Out Virtual Models in Playground

FAQ