Introduction

TrueFoundry jobs enable you to run task-oriented workloads which are meant to run for a certain duration to complete a task, then terminate and release the resources. Here are some scenarios where Jobs are particularly well-suited:

Model Training: Train machine learning models on large datasets, where the resource gets freed up once the training is complete.
Maintenance and Cleanup: Schedule routine maintenance tasks, such as data backups, model retraining, report generation etc.
Batch Inference: Perform large-scale batch inference tasks, such as processing large volumes of data using trained models, leveraging Job’s ability to handle parallel workloads efficiently.
Spark Jobs: Run Apache Spark jobs on your kubernetes cluster, with dynamic scaling.

TrueFoundry makes it easy to configure various aspects of your job deployment.

Dockerize Code

Deploy from Github, local machine or a prebuilt image.

Customize Resources

Set CPU, GPU, memory resources and spot/on-demand instances.

Environment Variables And Secrets

Set environment variables and secrets for your job.

Schedule Job

Schedule Job to run at a specific time.

Trigger your job

Trigger your job manually

Parameterize Job

Parameterize your job to enable ease of changing argument values.

Retries and Timeout

Set retries and timeout for your job in case the job gets stuck or fails for some reason.

Concurrency

Set concurrency limit for your job to specify how many instances of a Job can run at once.

Access Cloud Services

Access S3 / GCS /Azure Container / other cloud managed services.

Mounting Volumes

Mount volumes to cache data

Deploy Programatically

Deploy using Python and CLI

Setup CI/CD

Setup with your favorite CI/CD tool.

View Metrics

View the most important metrics for your job run.

View Logs

View the logs on a job run or per pod.

Set Up Alerts

Setup alerts and nottifications for your job

Clone, Update, and Rollback

Clone, update version, rollback to previous version and promote to production

Docker Build Secrets Getting Started

⌘I

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

LLM Deployment

LLM Finetuning

MCP Server Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Advanced Features

Dockerize Code

Customize Resources

Environment Variables And Secrets

Schedule Job

Trigger your job

Parameterize Job

Retries and Timeout

Concurrency

Access Cloud Services

Mounting Volumes

Deploy Programatically

Setup CI/CD

View Metrics

View Logs

Set Up Alerts

Clone, Update, and Rollback

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

LLM Deployment

LLM Finetuning

MCP Server Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Advanced Features

Documentation Index

Dockerize Code

Customize Resources

Environment Variables And Secrets

Schedule Job

Trigger your job

Parameterize Job

Retries and Timeout

Concurrency

Access Cloud Services

Mounting Volumes

Deploy Programatically

Setup CI/CD

View Metrics

View Logs

Set Up Alerts

Clone, Update, and Rollback