values for various autoscaler-compatible templates. These templates handle all the necessary configuration to run different images in API mode, search for suitable machines, and communicate successfully with the autoscaling server. Refer to the 

Customizing Your Own Templates: For all templates, some variables such as

 must be set. You can create a new private template by modifying relevant variables in the create template command. Then, set your autogroup to use that template by specifying the new template_hash of your newly created template. If the existing templates don't meet your needs, don't hesitate to contact us, and we’ll assist you in creating an autoscaler-compatible template.

Both endpoints take the same API payload:

 parameter, rather than the prompt size, impacts performance. For example, if an instance is benchmarked to process 100 tokens per second, a request with 

 will take approximately 2 seconds to complete.

 template is used as the base image for running the pyworker server for image generation.

Getting Started

Templates Reference

A lower level organization that lives within an Endpoint. It consists of a template (with extra filters for search), a set of GPU instances (workers) created from that template, and hyperparameters.

Worker_Groups

The highest level clustering of instances for the autoscaler, consisting of a named endpoint string, a collection of Worker groups, and hyperparameters.

Endpoints

The Vast PyWorker is a Python web server designed to run alongside a machine learning model instance, providing autoscaler compatibility.

PyWorker

Worker_Group

Endpoint

The minimum number of workers you want to keep "cold" (meaning stopped and fully loaded) when your group has no load.