LitServe is a lightweight and fast inference server for machine learning models. It can be a good alternative to FastAPI if you are looking for dynamic batching support. It also has higher level abstractions for authentication, middleware, OpenAI compatible spec, streaming etc built on top of FastAPI. In this example, we will deploy a simple Whisper model (speech to text) using faster-whisper and Litserve. You can find the code for this example here. The key files are:Documentation Index
Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
Use this file to discover all available pages before exploring further.
whisper_server.py: Contains theWhisperLitAPIthat implements theLitAPIinterface.requirements.txt: Contains the dependencies.
How to write the inference function in LitServe
Thewhisper_server.py file contains the WhisperLitAPI class that implements the LitAPI interface.
LitAPI class and implement the setup, decode_request, predict and encode_response methods.
setup: Load the model.decode_request: Decodes and transforms the request body to the input format expected by the model.predict: Processes the output ofdecode_requestand runs model inference.encode_response: Formats the response. Can perform any postprocessing on the response.
Running the server locally
- Install the dependencies
Shell
- Run the server
Shell
- Test the server
Shell
Deploying to TrueFoundry
Since the models are being pulled from HuggingFace Hub, we can directly deploy the code to TrueFoundry and use Artifacts Download feature to automatically download the model.Push the code to a Git repository or directly deploy from local machine
Once you have tested your code locally, we highly recommend pushing the code a Git repository. This allows you to version control the code and also makes the deployment process much easier. However, if you don’t have access to a Git repository, or the Git repositories are not integrated with Truefoundry, you can directly deploy from local laptop.
You can follow the guide here to deploy your code.Configure 
PythonBuild
Download Model from HuggingFace Hub in the deployment configuration
Add the model id and revision from HuggingFace Hub in 
Artifacts Download section