Skip to main content
Vast Serverless is an AI infrastructure platform that lets you run compute-intensive workloads without managing GPUs, paying for execution rather than GPU rental time. It is best suited for bursty workloads such as on-demand inference, batch jobs, and other usage patterns with variable or unpredictable demand. Interacting with Vast Serverless is made easy through a powerful python SDK. In addition to the standard benefits of using a serverless infrastructure, Vast Serverless provides further cost optimization through benchmarking to take advantage of the most cost-efficient GPUs in Vast’s marketplace. This enables better, more-cost effective scaling, but does require an evaluation period for each newly created endpoint to benchmark each workload against different GPU classes.

Unique Features

  • Benchmark-driven scaling: Automatic identification and recruitment of the best price-performance GPU to scale your unique workload.
  • One endpoint, mixed hardware: Automatically leverage Vast’s wide fleet of GPUs (from consumer-grade to the highest-end GPUs) to serve your needs, with a minimum of overhead.
  • Fine-grain control and transparency: Precise configurability and observability over your infrastructure gives unmatched control.
This guide introduces users to Vast Serverless concepts and best practices on how to achieve optimal configuration for your application.