@remote decorator is the core of Vast Deployments. It marks an async Python function for remote execution on GPU workers, and configures the benchmark used to measure worker performance.
Defining a @remote Function
- Async: Defined with
async def - Serializable: All arguments and return values must be serializable (primitives, lists, dicts, bytes, and custom objects with
__dict__) - Benchmarked: At least one
@remotefunction in a deployment must define a benchmark viabenchmark_datasetorbenchmark_generator
Decorator Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
benchmark_dataset | list[dict] | None | None | Static list of sample input dicts. Keys must match the function’s parameter names. |
benchmark_generator | Callable[[], dict] | None | None | A callable returning a sample input dict. Use for dynamic or randomized test data. |
benchmark_runs | int | 10 | Number of iterations to run during the benchmark. |
Calling @remote Functions
From a client script, import the deployment and the function, thenawait the call:
@remote function:
- The SDK waits for the deployment to be ready (workers provisioned and benchmarked)
- Arguments are serialized and routed to the quickest available worker
- The worker executes the function with full GPU access
- The return value is serialized and sent back to the caller
Accessing @context in @remote Functions
Useapp.get_context(ContextClass) to access resources loaded during worker startup:
Benchmarks
Exactly one@remote function should define a benchmark. Benchmarks run automatically before a worker first enters “ready” state and are used to measure each worker’s performance for autoscaling.
How Benchmarks Work
- Warmup (enabled by default): A warmup pass runs before timing to settle caches, JIT compilation, and GPU memory allocation
- Timed runs: The function is executed
benchmark_runstimes using inputs from the dataset or generator, with a default concurrency of 10 parallel requests - Scoring: Results produce a performance score that the autoscaler uses to determine worker capacity
Using benchmark_dataset
Provide a static list of sample inputs. During benchmarking, inputs are randomly selected from this list:Using benchmark_generator
For dynamic or randomized test data, provide a callable:benchmark_dataset or benchmark_generator.
Workload Calculators
By default, each request counts as a fixed amount of workload for autoscaling. For functions where request cost varies significantly (e.g., different prompt lengths for an LLM), you can define a custom workload calculator that computes the load from the request payload. This allows the autoscaler to make more accurate scaling decisions based on actual request complexity rather than just request count.Serialization
The SDK automatically handles serialization and deserialization of function arguments and return values. Supported types:- Primitives:
int,str,float,bool,None - Bytes: Encoded as base64
- Collections:
list,tuple,dict - Custom objects: Stored with module path, class name, and
__dict__