/endptjobs/
and /workergroups/
endpoints calls on the webserver to create a new Endpoint and Workergroup.
POST https://console.vast.ai/api/v0/endptjobs/
Inputs
api_key
(string): The Vast API key associated with the account that controls the Endpoint. The key can also be placed in the header as an Authorization: Bearer.endpoint_name
(string): The name given to the endpoint that is created.min_load
(integer): A minimum baseline load (measured in tokens/second for LLMs) that the serverless engine will assume your Endpoint needs to handle, regardless of actual measured traffic.target_util
(float): A ratio that determines how much spare capacity (headroom) the serverless engine maintains.cold_mult
(float): A multiplier applied to your target capacity for longer-term planning (1+ hours). This parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs.cold_workers
(integer): The minimum number of workers that must be kept in a “ready quick” state before the serverless engine is allowed to destroy any workers.max_workers
(integer): A hard upper limit on the total number of worker instances (ready, stopped, loading, etc.) that your endpoint can have at any given time.
JSON
Outputs
On Successful Worker Return
success
(bool): True on successful creation of Endpoint, False if otherwise.result
(int): The endpoint_id of the newly created Endpoint.
JSON
On Failure to Find Ready Worker
success
(bool): True on successful creation of Endpoint, False if otherwise.error
(string): The type of error status.msg
(string): The error message related to the error.
JSON
Example: Creating an Endpoint with cURL
Curl
Example: Creating an Endpoint with the Vast CLI
Terminal
POST https://console.vast.ai/api/v0/workergroups/
Inputs
Required:api_key
(string): The Vast API key associated with the account that controls the Endpoint. The key can also be placed in the header as an Authorization: Bearer.endpoint_name
(string): The name of the Endpoint that the Workergroup will be created under.
template_hash
(string): The hexadecimal string that identifies a particular template.
template_id
(integer): The unique id that identifes a template.
search_params
, as they are automatically inferred from the template.
OR
search_params
(string): A query string that specifies the hardware and performance criteria for filtering GPU offers in the vast.ai marketplace.launch_args
(string): A command-line style string containing additional parameters for instance creation that will be parsed and applied when the autoscaler creates new workers. This allows you to customize instance configuration beyond what’s specified in templates.
min_load
(integer): A minimum baseline load (measured in tokens/second for LLMs) that the serverless engine will assume your Endpoint needs to handle, regardless of actual measured traffic. Default value is 1.0.target_util
(float): A ratio that determines how much spare capacity (headroom) the serverless engine maintains. Default value is 0.9.cold_mult
(float): A multiplier applied to your target capacity for longer-term planning (1+ hours). This parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs. Default value is 3.0.test_workers
(integer): The number of different physical machines that a Workergroup should test during its initial “exploration” phase to gather performance data before transitioning to normal demand-based scaling. Default value is 3.gpu_ram
(integer): The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs. Default value is 24.
JSON
Outputs
On Successful Worker Return
success
(bool): True on successful creation of Workergroup, False if otherwise.result
(int): The autogroup_id of the newly created Workergroup.
JSON
On Failure to Find Ready Worker
success
(bool): True on successful creation of Workergroup, False if otherwise.error
(string): The type of error status.msg
(string): The error message related to the error.
JSON
Example: Creating a Workergroup with cURL
Curl
Example: Creating an Endpoint with the Vast CLI
Terminal