Log & Monitor Custom Metrics

In this section, we will cover how to log your custom application metrics and then use TrueFoundry to monitor them. Your custom application metrics can be something like the number of times an API was called, the time taken for a function to execute, or something completely custom. TrueFoundry by default uses Prometheus to scrap all the metrics exposed at /metricsendpoint of your server.

Logging metrics

Let us understand with an example of a FastAPI inference service. You can find the complete code at Github Repository

import os
import joblib
import pandas as pd
from fastapi import FastAPI

# Loading the model from local
model = joblib.load("iris_classifier.joblib")

app = FastAPI(docs_url="/", root_path=os.getenv("TFY_SERVICE_ROOT_PATH", "/"))


@app.post("/predict")
def predict(
    sepal_length: float, sepal_width: float, petal_length: float, petal_width: float
):
    data = dict(
        sepal_length=sepal_length,
        sepal_width=sepal_width,
        petal_length=petal_length,
        petal_width=petal_width,
    )
    prediction = int(model.predict(pd.DataFrame([data]))[0])
    return {"prediction": prediction}

This is a simple inference service exposing an /predict endpoint. Now, we would like to track how many times this API has been called and how much time it took for each of these predictions, in other words, what is the latency of this API? Let us start by installing the Python library for Prometheus

pip install prometheus-client

Update the code to log metrics

import os

import joblib
import pandas as pd
from fastapi import FastAPI
from prometheus_client import Counter, Histogram, make_asgi_app

# Loading the model from local
model = joblib.load("iris_classifier.joblib")

# Define a Histogram metric to track latency of request as percentile.
REQUEST_TIME = Histogram("request_latency_seconds", "Time spent processing request")

# Define a Counter metric to track number of requests.
REQUEST_COUNT = Counter("request_count", "Number of inference request")

app = FastAPI(docs_url="/", root_path=os.getenv("TFY_SERVICE_ROOT_PATH", "/"))


@app.post("/predict")
def predict(
    sepal_length: float, sepal_width: float, petal_length: float, petal_width: float
):
    # Increase the request count by 1
    REQUEST_COUNT.inc()
     # Time the predict function, and observe the duration in seconds.
    with REQUEST_TIME.time():
        data = dict(
            sepal_length=sepal_length,
            sepal_width=sepal_width,
            petal_length=petal_length,
            petal_width=petal_width,
        )
        prediction = int(model.predict(pd.DataFrame([data]))[0])
        return {"prediction": prediction}


app.mount("/metrics", make_asgi_app())

Run the FastAPI server with the following command uvicorn app:app --port 8000 --host 0.0.0.0 Now, you can check the exposed metrics at http://localhost:8000/metrics

Congratulations! You have logged the metrics successfully. Now, go ahead and deploy this service. To learn more about different kinds of metrics details, please refer to prometheus-client documentation.

Monitor exposed metrics

Add the following kustomize patch to your service while deploying. This will add necessary annotations to your service Pods for Prometheus to scrape metrics. Please fill the placeholders with the correct service-name and service-port-number.

service = Service(
    ...
    kustomize=Kustomize(
        patch={
            "patchesStrategicMerge": [
"""\
kind: Deployment
apiVersion: apps/v1
metadata:
  name: <service-name>
spec:
  template:
    metadata:
      annotations:
        prometheus.io/port: "<service-port-number>"	
        prometheus.io/scrape: "true"
"""
            ]
        }
    ),
  ...
)

You can now export the application Grafana dashboard if not done already.

Note: Please ensure that Prometheus and Grafana is installed in your cluster.

Open your Grafana dashboard for the service.
Now, to monitor your custom metrics add a new Visualization in your service dashboard.
For counter metric, use the query request_count_total{container=~"iris-inference",namespace=~"demo-ws"} where container is service_name and namespace is workspace_name.
For histogram metric, use multiple Query section with different percentile like round(histogram_quantile(0.99, sum(rate(request_latency_seconds_bucket{namespace=~"demo-ws", container=~"iris-inference"}[$__rate_interval])) by (le)), 0.001) represents p99, similarly round(histogram_quantile(0.90, sum(rate(request_latency_seconds_bucket{namespace=~"demo-ws", container=~"iris-inference"}[$__rate_interval])) by (le)), 0.001) represents p90 where container is service_name and namespace is workspace_name
Save the changes and you are ready with your dashboard.

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

LLM Deployment

LLM Finetuning

MCP Server Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Advanced Features

Logging metrics

Monitor exposed metrics

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

LLM Deployment

LLM Finetuning

MCP Server Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Advanced Features

Documentation Index

​Logging metrics

​Monitor exposed metrics

Logging metrics

Monitor exposed metrics