Concepts

Vast.ai is a marketplace that connects hosts (people and datacenters with GPUs to rent out) with renters (people who need GPUs to run workloads). This page defines the terms you will see throughout the rest of the documentation.

Marketplace

Host

A host is anyone who lists GPU hardware on Vast. Hosts range from individuals with a single gaming PC to Tier‑4 datacenters. Each host sets their own prices, reliability expectations, and verification level. The full host-side documentation lives in the Host tab.

Renter

A renter is anyone who rents GPU capacity from the marketplace. Most of this documentation is written for renters.

Machine

A machine is a single physical host system registered on Vast. One machine can publish one or more offers corresponding to different slices of its GPUs, storage, and bandwidth.

Offer

An offer is a specific configuration a host is willing to rent out, shown as a row in the search results at cloud.vast.ai/create. Each offer includes:

GPU model, count, and total GPU RAM
CPU, system RAM, disk space, and bandwidth
Price, max rental duration, and location
A DLPerf score and a reliability score

You search and filter offers to find GPUs that match your workload. See Find & rent.

DLPerf score

A Vast-defined benchmark score that approximates real-world deep learning throughput for a given GPU + host combination. Use it to compare offers with different GPUs apples-to-apples instead of relying on raw specs.

Reliability score

A measure of a machine’s historical uptime and health. New machines start at 60% and climb as they demonstrate availability. Higher reliability matters more the longer your rental.

Renting

Instance

An instance is what you get when you accept an offer, a running, isolated environment on the host’s machine with exclusive access to the GPUs you rented. Instances are almost always Docker containers; a small subset are virtual machines. You connect to instances over SSH, Jupyter, or HTTP. Instances bill by the second for the time they are running, plus storage for the time they exist.

Template

A template is a reusable launch configuration. In the simplest terms, it is a wrapper around docker run: it specifies the Docker image, environment variables, exposed ports, on-start commands, disk size defaults, and any provisioning script. You launch an instance from a template. Vast ships recommended templates (built on vastai/base-image and vastai/pytorch) that include the Instance Portal, Caddy-based TLS, and authentication. You can also create your own. See Templates.

Rental contract

A rental contract is the agreement between you and the host for one instance. Each contract has a maximum duration (shown on the offer card) and an instance type that determines priority and pricing:

On-demand, fixed price, high priority, guaranteed until the max duration expires.
Reserved, on-demand with pre-paid discounts for longer commitments.
Interruptible, bidding-based, lowest cost, may be paused when outbid or when on-demand demand spikes.

Details and tradeoffs in Instance Types.

Serverless

Serverless is Vast’s managed layer on top of instances. Instead of renting one instance and pointing a client at it, you define an endpoint that autoscales a pool of workers for you.

Endpoint

An endpoint is the top-level construct in Serverless. It is the stable, named entry point that your client code calls. Endpoints own scaling policy, max_workers, min_workers, target_util, queue-time targets, and so on. You typically create one endpoint per use case (e.g. text-generation-prod).

Worker group

A worker group belongs to an endpoint and defines what runs on each worker: a template, hardware filters (e.g. gpu_ram), marketplace search parameters, and launch overrides. Most endpoints have a single worker group; multiple worker groups per endpoint enable mixed-model serving and hardware A/B comparisons.

Worker

A worker is one GPU instance recruited by a worker group to serve traffic for its endpoint. Workers are created, activated, and destroyed automatically by the Serverless Engine based on measured load.

PyWorker

The PyWorker is a small Python web server that runs alongside your model inside each worker. It proxies requests to your inference server, validates auth, and reports load metrics back to the Serverless Engine so it can scale correctly. All Vast-provided serverless templates include a PyWorker; custom templates can ship their own. See the PyWorker overview.

Serverless Engine

The Serverless Engine is the Vast-managed service that routes requests to workers, decides when to recruit or release workers, and continuously evaluates cost-performance tradeoffs using the metrics PyWorkers report.

Where to go next

Quickstart

Rent your first instance

Find & rent

Learn to search offers effectively

Templates

Pre-built and custom launch configs

Serverless

Autoscaled GPU endpoints

Getting started

Instances

Serverless

Templates

Teams

Account & billing

FAQ

Marketplace

Host

Renter

Machine

Offer

DLPerf score

Reliability score

Renting

Instance

Template

Rental contract

Serverless

Endpoint

Worker group

Worker

PyWorker

Serverless Engine

Where to go next

Quickstart

Find & rent

Templates

Serverless

Getting started

Instances

Serverless

Templates

Teams

Account & billing

FAQ

Documentation Index

​Marketplace

​Host

​Renter

​Machine

​Offer

​DLPerf score

​Reliability score

​Renting

​Instance

​Template

​Rental contract

​Serverless

​Endpoint

​Worker group

​Worker

​PyWorker

​Serverless Engine

​Where to go next

Quickstart

Find & rent

Templates

Serverless

Marketplace

Host

Renter

Machine

Offer

DLPerf score

Reliability score

Renting

Instance

Template

Rental contract

Serverless

Endpoint

Worker group

Worker

PyWorker

Serverless Engine

Where to go next