Vast.ai is a marketplace that connects hosts (people and datacenters with GPUs to rent out) with renters (people who need GPUs to run workloads). This page defines the terms you will see throughout the rest of the documentation.Documentation Index
Fetch the complete documentation index at: https://docs.vast.ai/llms.txt
Use this file to discover all available pages before exploring further.
Marketplace
Host
A host is anyone who lists GPU hardware on Vast. Hosts range from individuals with a single gaming PC to Tier‑4 datacenters. Each host sets their own prices, reliability expectations, and verification level. The full host-side documentation lives in the Host tab.Renter
A renter is anyone who rents GPU capacity from the marketplace. Most of this documentation is written for renters.Machine
A machine is a single physical host system registered on Vast. One machine can publish one or more offers corresponding to different slices of its GPUs, storage, and bandwidth.Offer
An offer is a specific configuration a host is willing to rent out, shown as a row in the search results at cloud.vast.ai/create. Each offer includes:- GPU model, count, and total GPU RAM
- CPU, system RAM, disk space, and bandwidth
- Price, max rental duration, and location
- A DLPerf score and a reliability score
DLPerf score
A Vast-defined benchmark score that approximates real-world deep learning throughput for a given GPU + host combination. Use it to compare offers with different GPUs apples-to-apples instead of relying on raw specs.Reliability score
A measure of a machine’s historical uptime and health. New machines start at 60% and climb as they demonstrate availability. Higher reliability matters more the longer your rental.Renting
Instance
An instance is what you get when you accept an offer, a running, isolated environment on the host’s machine with exclusive access to the GPUs you rented. Instances are almost always Docker containers; a small subset are virtual machines. You connect to instances over SSH, Jupyter, or HTTP. Instances bill by the second for the time they are running, plus storage for the time they exist.Template
A template is a reusable launch configuration. In the simplest terms, it is a wrapper arounddocker run: it specifies the Docker image, environment variables, exposed ports, on-start commands, disk size defaults, and any provisioning script. You launch an instance from a template.
Vast ships recommended templates (built on vastai/base-image and vastai/pytorch) that include the Instance Portal, Caddy-based TLS, and authentication. You can also create your own. See Templates.
Rental contract
A rental contract is the agreement between you and the host for one instance. Each contract has a maximum duration (shown on the offer card) and an instance type that determines priority and pricing:- On-demand, fixed price, high priority, guaranteed until the max duration expires.
- Reserved, on-demand with pre-paid discounts for longer commitments.
- Interruptible, bidding-based, lowest cost, may be paused when outbid or when on-demand demand spikes.
Serverless
Serverless is Vast’s managed layer on top of instances. Instead of renting one instance and pointing a client at it, you define an endpoint that autoscales a pool of workers for you.Endpoint
An endpoint is the top-level construct in Serverless. It is the stable, named entry point that your client code calls. Endpoints own scaling policy,max_workers, min_workers, target_util, queue-time targets, and so on. You typically create one endpoint per use case (e.g. text-generation-prod).
Worker group
A worker group belongs to an endpoint and defines what runs on each worker: a template, hardware filters (e.g.gpu_ram), marketplace search parameters, and launch overrides. Most endpoints have a single worker group; multiple worker groups per endpoint enable mixed-model serving and hardware A/B comparisons.
Worker
A worker is one GPU instance recruited by a worker group to serve traffic for its endpoint. Workers are created, activated, and destroyed automatically by the Serverless Engine based on measured load.PyWorker
The PyWorker is a small Python web server that runs alongside your model inside each worker. It proxies requests to your inference server, validates auth, and reports load metrics back to the Serverless Engine so it can scale correctly. All Vast-provided serverless templates include a PyWorker; custom templates can ship their own. See the PyWorker overview.Serverless Engine
The Serverless Engine is the Vast-managed service that routes requests to workers, decides when to recruit or release workers, and continuously evaluates cost-performance tradeoffs using the metrics PyWorkers report.Where to go next
Quickstart
Rent your first instance
Find & rent
Learn to search offers effectively
Templates
Pre-built and custom launch configs
Serverless
Autoscaled GPU endpoints