Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vast.ai/llms.txt

Use this file to discover all available pages before exploring further.

The vastai run benchmarks CLI command rents one instance of each GPU class you give it, runs the template’s built-in benchmark workload, and reports performance per dollar. Multiple GPUs run in parallel and tear themselves down when finished.
Each run rents real instances and consumes account credit; cost scales with the GPU class count and per-rental timeout.

When to Use It

  • Picking which GPU class to rent for an on-demand instance, or which classes to allow in a Serverless Workergroup’s search_params.
  • Comparing two templates (for example, vLLM vs. TGI for the same model) on the same hardware.
  • Validating that a template fits and runs correctly on a GPU before committing to longer rentals or production traffic.
  • Producing a perf/dollar number for capacity planning or budgeting.

Prerequisites

Basic Usage

If you don’t pass --gpus, the CLI sweeps a built-in default set (RTX 5090, RTX 4090, RTX 3090, RTX A6000), which is a reasonable starting point if you’re not sure which classes to compare:
vastai run benchmarks --template_hash 79ebdd2ebfb9d42cedf7a221c42d37a5
Otherwise, pass an explicit list of GPUs:
vastai run benchmarks \
  --template_hash 393fa8572e6c73c927c8275fe4dffd53 \
  --gpus RTX_4090,RTX_3090
Use the Nx prefix to request multi-GPU configurations on a per-token basis:
vastai run benchmarks \
  --template_hash 40ef49becc953aa910ee05bd4653b9b3 \
  --gpus "2x RTX_4090, 2x RTX_3090"

How It Works

For each GPU in your list, the CLI pre-flights the marketplace against the template’s extra_filters (skipping GPUs with no matching offers and reporting which filter excluded them), creates a scratch endpoint and one-worker Workergroup, polls until the worker reaches status=idle with a positive measured_perf, then records the result and tears the rental down. The rental’s actual dph_total is fetched at idle so perf/dollar reflects the real run, not a marketplace estimate. GPU count per rental comes from the Nx token prefix if set, otherwise --num_gpus, otherwise auto-sized from the template’s gpu_total_ram filter.

Reading the Output

While the run is in progress, the CLI renders a live table:
ColumnMeaning
GPUThe GPU class being benchmarked.
StatusCurrent worker state: queued, provisioning, waiting_for_worker, loading, idle, done, failed, timeout, no_worker, skipped, aborted, error.
EndpointThe ephemeral endpoint ID created for this run.
WorkerThe worker (instance) ID, once one has been recruited.
ElapsedTime since the worker started running. Freezes on terminal status.
PerfThe template’s measured_perf (workload-units per second; tokens/sec for typical LLMs, requests/sec when the template has no custom workload calculator). Useful for ranking GPUs on the same template, not for cross-template comparison.
$/hrThe rented worker’s dph_total, the hourly rate the contract is being billed at.
Perf/$/hrCost-efficiency score: measured performance divided by hourly price. Higher is better.
After the run, the CLI prints a sorted summary by Perf/$/hr, so the most cost-efficient GPU for your workload is at the top. With --raw, the same data is emitted as JSON for scripting:
[
  {
    "gpu_name": "RTX 4090",
    "rental_dph": 0.428,
    "measured_perf": 142.3,
    "status": "ok",
    "perf_per_dollar": 332.5
  }
]

Cost and Timeout

Each GPU rental runs for up to --timeout seconds (default 3600). The CLI prints what it’s about to do, the number of configurations, the GPU mix, and the per-rental timeout, then asks for confirmation. Pass -y to skip the prompt in scripts. Real runs almost always finish well before the timeout because the runner exits as soon as it reads a valid measured_perf. You can tighten the ceiling for cheaper runs:
vastai run benchmarks \
  --template_hash 393fa8572e6c73c927c8275fe4dffd53 \
  --timeout 600 -y
If you Ctrl+C mid-run, the CLI installs a SIGINT handler that finishes deleting in-flight endpoints and Workergroups before exiting. Do not kill the process again during cleanup, otherwise you may need to manually remove the leftovers with vastai delete endpoint and vastai delete workergroup.

Common Outcomes

StatusWhat it meansWhat to do
okBenchmark completed and reported measured_perf.Use the result.
skippedNo marketplace offer matched after applying the template’s filters.The CLI prints which filter blocked the GPU. Loosen extra_filters on the template, or pick a different GPU.
no_workerThe autoscaler did not rent any instance within 120 seconds.Often a scoring or template+GPU mismatch the pre-flight missed. Try a different GPU or relax filters.
failedWorkers reached a terminal state (stopped, destroying, unavail) without ever becoming idle.Inspect worker logs in the dashboard. Common causes are model download failure or an OOM during load.
timeoutThe worker was still loading or running when --timeout elapsed.Increase --timeout, or check whether the host is unusually slow (Docker pull stalls are the typical culprit).

Full Flag Reference

See the vastai run benchmarks reference for every flag and its default.
If you’re operating a Serverless Workergroup, the autoscaler already runs this same benchmark on every worker it recruits and uses the results to drive its own GPU choices. See Automated Performance Testing for how that works.