Serverless - Vast.ai Documentation: Affordable GPU Cloud Marketplace

Vast Serverless is an AI infrastructure platform that lets you run compute-intensive workloads without managing GPUs, paying for execution rather than GPU rental time. It is best suited for bursty workloads such as on-demand inference, batch jobs, and other usage patterns with variable or unpredictable demand. Interacting with Vast Serverless is made easy through a powerful python SDK. In addition to the standard benefits of using a serverless infrastructure, Vast Serverless provides further cost optimization through benchmarking to take advantage of the most cost-efficient GPUs in Vast’s marketplace. This enables better, more-cost effective scaling, but does require an evaluation period for each newly created endpoint to benchmark each workload against different GPU classes.

Unique Features

Benchmark-driven scaling: Automatic identification and recruitment of the best price-performance GPU to scale your unique workload.
One endpoint, mixed hardware: Automatically leverage Vast’s wide fleet of GPUs (from consumer-grade to the highest-end GPUs) to serve your needs, with a minimum of overhead.
Fine-grain control and transparency: Precise configurability and observability over your infrastructure gives unmatched control.

This guide introduces users to Vast Serverless concepts and best practices on how to achieve optimal configuration for your application.

Virtual Machines Quickstart

⌘I

​Unique Features

Unique Features