Skip to main content
Deployments are currently in beta. APIs and behavior may change as the feature evolves.
Vast Deployments let you run Python functions on remote GPUs with a single decorator. You define everything in one .py file (your code, your Docker image, your GPU requirements, and your autoscaling settings), and the SDK handles packaging, uploading, provisioning workers, and routing function calls automatically.

Why Deployments?

Deployments abstract away the infrastructure of GPU serverless computing. Instead of configuring endpoints, workergroups, and PyWorkers separately, you write Python functions and call them as if they were local. Under the hood, the SDK:
  • Packages and uploads your code to the cloud
  • Creates a managed Serverless endpoint and workergroup
  • Provisions GPU workers with your specified image and requirements
  • Installs packages, loads secrets, and runs startup scripts
  • Routes function calls to ready workers and returns results

Core Concepts

@remote Functions

The @remote decorator marks an async Python function for remote execution. When you call it from your local machine, the SDK serializes the arguments, routes the call to a GPU worker, executes the function, and returns the result — all through a single await call.
from vastai import Deployment

app = Deployment(name="my-app")

@app.remote(benchmark_dataset=[{"x": 2}])
async def square(x):
    return x * x

@context Classes

Context classes load heavy resources (models, engines, connections) once at worker startup and make them available to all remote function calls. This avoids reloading a model on every request.
@app.context()
class MyModel:
    async def __aenter__(self):
        self.model = load_model()
        return self
    async def __aexit__(self, *exc):
        pass

Image Configuration

The Image object configures the Docker image, pip/apt packages, environment variables, GPU requirements, and startup scripts for your workers.
image = app.image("vastai/pytorch:@vastai-automatic-tag", 16)
image.pip_install("torch", "transformers")
image.require(gpu_name.in_([RTX_4090, RTX_5090]))

Benchmarks

Each deployment defines a benchmark that runs when workers start up. The benchmark measures worker performance, which the autoscaler uses to determine capacity and make scaling decisions.

Minimal Example

Here is a complete deployment in a single file:
# deploy.py
from vastai import Deployment
from vastai.data.query import gpu_name, RTX_4090, RTX_5090

app = Deployment(name="square")

@app.remote(benchmark_dataset=[{"x": 2}])
async def square(x):
    return x * x

image = app.image("vastai/base-image:@vastai-automatic-tag", 16)
image.require(gpu_name.in_([RTX_4090, RTX_5090]))
app.configure_autoscaling(min_load=1000)
app.ensure_ready()
And a client that calls it:
# client.py
import asyncio
from deploy import app, square

async def main():
    result = await square(5)  # Executes on a remote GPU, returns 25
    print(result)

asyncio.run(main())

Next Steps

Architecture

Understand how deploy mode, serve mode, and update tiers work

Configuring Deployments

Image, packages, GPU requirements, autoscaling, and environment setup

@remote Functions

Define, call, and benchmark remote GPU functions

@context Classes

Load models and resources once at worker startup