The Deployment Object
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | None | None | Deployment name. Auto-set from the module name of the first @remote function if omitted. |
tag | str | "default" | Version tag. Changing this triggers a full redeploy (Tier 4) with a new endpoint. |
version_label | str | None | None | Optional semantic version label for tracking. |
api_key | str | env var | Your Vast API key. Reads from $VAST_API_KEY if not provided. |
ttl | float | None | None | Seconds of inactivity before auto-teardown. None means the deployment lives indefinitely. |
Image Configuration
app.image() returns an Image object for configuring the Docker image, packages, and hardware requirements. All methods return self for chaining.
| Parameter | Type | Default | Description |
|---|---|---|---|
from_image | str | required | Docker image URI (e.g. "vastai/pytorch:@vastai-automatic-tag") |
storage | float | 50 | Storage allocation in GB |
Installing Packages
Environment Variables
Startup Scripts and Commands
Copying Local Files
.copy() are bundled into the deployment tarball and placed at the specified destination path on workers.
Python Environment
By default, the SDK manages its own virtual environment on workers. You can override this:Publishing Additional Ports
Image Methods Reference
| Method | Description |
|---|---|
pip_install(*packages) | Install pip packages on worker startup |
apt_get(*packages) | Install apt packages on worker startup |
env(**kwargs) | Set environment variables |
run_script(script_str) | Run a shell script on startup |
run_cmd(*args) | Run a command on startup |
copy(src, dst) | Copy local files into the deployment bundle |
venv(path) | Use an existing venv at the given path |
use_system_python() | Use the image’s system Python instead of a venv |
publish_port(number, type_) | Publish additional ports on the worker |
require(*queries) | Set GPU/hardware search requirements |
GPU and Hardware Requirements
Hardware requirements are specified using the query builder fromvastai.data.query. Pass Query objects to image.require():
Query Operators
| Operator | Example | Description |
|---|---|---|
== | gpu_name == RTX_4090 | Equals |
!= | gpu_name != RTX_3090 | Not equals |
< | dph_total < 2.0 | Less than |
<= | gpu_ram <= 24 | Less than or equal |
> | inet_down > 500 | Greater than |
>= | cpu_cores >= 16 | Greater than or equal |
.in_() | gpu_name.in_([RTX_4090, H100_SXM]) | Value in list |
.notin_() | gpu_name.notin_([RTX_3060]) | Value not in list |
Queryable Columns
GPU:gpu_name, gpu_ram, gpu_total_ram, gpu_max_power, gpu_max_temp, gpu_arch, gpu_mem_bw, gpu_lanes, gpu_frac, gpu_display_active, num_gpus, compute_cap, cuda_max_good, bw_nvlink, total_flops
CPU: cpu_name, cpu_cores, cpu_cores_effective, cpu_ghz, cpu_ram, cpu_arch
Storage & Disk: disk_space, disk_bw, disk_name, allocated_storage
Network: inet_up, inet_down, inet_up_cost, inet_down_cost, direct_port_count, pcie_bw, pci_gen
Pricing: dph_base, dph_total, storage_cost, storage_total_cost, vram_costperhour, min_bid, credit_discount_max, flops_per_dphtotal, dlperf_per_dphtotal
Machine & Host: host_id, machine_id, hostname, public_ipaddr, reliability, expected_reliability, os_version, driver_vers, mobo_name, has_avx, static_ip, external, verification, hosting_type, vms_enabled, resource_type, cluster_id
Virtual Columns (convenience aliases resolved by the API): geolocation, datacenter, duration, verified, allocated_storage, target_reliability
GPU Name Constants
Import GPU name constants fromvastai.data.query. A selection of commonly used ones:
NVIDIA Data Center: A100_PCIE, A100_SXM4, H100_PCIE, H100_SXM, H100_NVL, H200, H200_NVL, B200, GH200_SXM, L4, L40, L40S, A10, A30, A40, Tesla_T4, Tesla_V100
NVIDIA Consumer: RTX_5090, RTX_5080, RTX_5070_Ti, RTX_5070, RTX_4090, RTX_4080S, RTX_4080, RTX_4070_Ti, RTX_4070S, RTX_3090, RTX_3090_Ti, RTX_3080_Ti, RTX_3080
NVIDIA Professional: RTX_A6000, RTX_6000Ada, RTX_5880Ada, RTX_5000Ada, RTX_PRO_6000
AMD: InstinctMI250X, InstinctMI210, InstinctMI100, RX_7900_XTX, PRO_W7900, PRO_W7800
Autoscaling Configuration
These parameters control how your deployment scales workers up and down in response to load. For a detailed explanation of how each parameter affects scaling behavior, see Serverless Parameters.configure_autoscaling() multiple times — later calls update (not replace) previously set values.
| Parameter | Type | Description |
|---|---|---|
cold_workers | int | Number of idle workers to keep ready |
max_workers | int | Maximum concurrent workers |
min_load | int | Minimum load threshold to trigger scaling |
min_cold_load | int | Load threshold for maintaining cold workers |
target_util | float | Target utilization ratio (0.0 to 1.0) |
cold_mult | int | Cold worker multiplier |
max_queue_time | float | Maximum seconds a request can wait in queue |
target_queue_time | float | Target queue wait time in seconds |
inactivity_timeout | int | Seconds of inactivity before scaling down |
Deploying with ensure_ready()
After defining your remote functions, image configuration, and autoscaling settings, callensure_ready() to deploy:
- Packages your deployment code and configuration into a tarball
- Computes a content hash to determine if anything has changed
- Registers the deployment with the Vast API
- Uploads the tarball to cloud storage (if the code has changed)
- Triggers the appropriate update tier if workers are already running
ensure_ready() before invoking any @remote functions.