Prompt Caching

Minimum Cacheable Length
Usage
Monitoring Cache Performance

Prompt caching optimizes API usage by allowing resumption from specific prefixes in your prompts. This significantly reduces processing time and costs for repetitive tasks or prompts with consistent elements.

Currently, only Anthropic models support this caching feature. See Anthropic documentation for more details.

Minimum Cacheable Length

Model	Minimum Token Length
Claude Opus 4, Claude Sonnet 4, Claude Sonnet 3.7, Claude Sonnet 3.5, Claude Opus 3	1024 tokens
Claude Haiku 3.5, Claude Haiku 3	2048 tokens

Usage

This feature is only available through direct REST API calls. The OpenAI SDK doesn’t recognize the cache_control field.

Add the cache_control parameter to any message content you want to cache:

import requests
import json

URL = "{GATEWAY_BASE_URL}/chat/completions"
API_KEY = "TFY_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-TFY-LOGGING-CONFIG": '{"enabled": true}'
}

payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "<TEXT_TO_CACHE>",
                    "cache_control": {
                        "type": "ephemeral"
                    }
                }
            ]
        }
    ],
    "model": "MODEL_NAME",
    "stream": True
}

response = requests.post(URL, headers=headers, json=payload)

Monitoring Cache Performance

Monitor cache performance using these API response fields, within usage in the response (or message_start event if streaming):

cache_creation_input_tokens: Tokens written to the cache when creating a new entry
cache_read_input_tokens: Tokens retrieved from the cache for this request

⌘I

Get Started

LLM Gateway

MCP Registry and Gateway

Agent Registry

Skills Registry

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

Chat

Agent

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Minimum Cacheable Length

Usage

Monitoring Cache Performance

Get Started

LLM Gateway

MCP Registry and Gateway

Agent Registry

Skills Registry

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

Chat

Agent

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Documentation Index

​Minimum Cacheable Length

​Usage

​Monitoring Cache Performance

Minimum Cacheable Length

Usage

Monitoring Cache Performance