Deployments are currently in beta. APIs and behavior may change as the feature evolves.
.py file (your code, your Docker image, your GPU requirements, and your autoscaling settings), and the SDK handles packaging, uploading, provisioning workers, and routing function calls automatically.
Why Deployments?
Deployments abstract away the infrastructure of GPU serverless computing. Instead of configuring endpoints, workergroups, and PyWorkers separately, you write Python functions and call them as if they were local. Under the hood, the SDK:- Packages and uploads your code to the cloud
- Creates a managed Serverless endpoint and workergroup
- Provisions GPU workers with your specified image and requirements
- Installs packages, loads secrets, and runs startup scripts
- Routes function calls to ready workers and returns results
Core Concepts
@remote Functions
The@remote decorator marks an async Python function for remote execution. When you call it from your local machine, the SDK serializes the arguments, routes the call to a GPU worker, executes the function, and returns the result — all through a single await call.
@context Classes
Context classes load heavy resources (models, engines, connections) once at worker startup and make them available to all remote function calls. This avoids reloading a model on every request.Image Configuration
TheImage object configures the Docker image, pip/apt packages, environment variables, GPU requirements, and startup scripts for your workers.
Benchmarks
Each deployment defines a benchmark that runs when workers start up. The benchmark measures worker performance, which the autoscaler uses to determine capacity and make scaling decisions.Minimal Example
Here is a complete deployment in a single file:Next Steps
Architecture
Understand how deploy mode, serve mode, and update tiers work
Configuring Deployments
Image, packages, GPU requirements, autoscaling, and environment setup
@remote Functions
Define, call, and benchmark remote GPU functions
@context Classes
Load models and resources once at worker startup