The /endptjobs/ and /workergroups/ endpoints calls on the webserver to create a new Endpoint and Workergroup.

POST https://console.vast.ai/api/v0/endptjobs/

Inputs

  • api_key(string): The Vast API key associated with the account that controls the Endpoint. The key can also be placed in the header as an Authorization: Bearer.
  • endpoint_name(string): The name given to the endpoint that is created.
  • min_load(integer): A minimum baseline load (measured in tokens/second for LLMs) that the serverless engine will assume your Endpoint needs to handle, regardless of actual measured traffic.
  • target_util (float): A ratio that determines how much spare capacity (headroom) the serverless engine maintains.
  • cold_mult(float): A multiplier applied to your target capacity for longer-term planning (1+ hours). This parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs.
  • cold_workers (integer): The minimum number of workers that must be kept in a “ready quick” state before the serverless engine is allowed to destroy any workers.
  • max_workers (integer): A hard upper limit on the total number of worker instances (ready, stopped, loading, etc.) that your endpoint can have at any given time.
JSON
{
    "api_key": "YOUR_VAST_API_KEY",
    "endpoint_name": "YOUR_ENDPOINT_NAME",
    "min_load": 10,
    "target_util": 0.9,
    "cold_mult": 2.0,
    "cold_workers": 5,
    "max_workers": 20
}

Outputs

On Successful Worker Return

  • success(bool): True on successful creation of Endpoint, False if otherwise.
  • result(int): The endpoint_id of the newly created Endpoint.
JSON
{
  "success": true,
  "result": 1234
}

On Failure to Find Ready Worker

  • success(bool): True on successful creation of Endpoint, False if otherwise.
  • error(string): The type of error status.
  • msg (string): The error message related to the error.
JSON
{
  "success": false,
  "error": "auth_error",
  "msg": "Invalid user key"
}

Example: Creating an Endpoint with cURL

Curl
curl --location 'https://console.vast.ai/api/v0/endptjobs/' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer MY_VAST_TOKEN_HERE' \
--data '{
  "min_load": 10,
  "target_util": 0.9,
  "cold_mult": 2.5,
  "cold_workers": 5,
  "max_workers": 20,
  "endpoint_name": "my-endpoint"
}'

Example: Creating an Endpoint with the Vast CLI

Terminal
vastai create endpoint --min_load 10 --target_util 0.9 --cold_mult 2.0 --cold_workers 5 --max_workers 20 --endpoint_name "my-endpoint"

POST https://console.vast.ai/api/v0/workergroups/

Inputs

Required:
  • api_key(string): The Vast API key associated with the account that controls the Endpoint. The key can also be placed in the header as an Authorization: Bearer.
  • endpoint_name(string): The name of the Endpoint that the Workergroup will be created under.
AND one of the following:
  • template_hash (string): The hexadecimal string that identifies a particular template.
OR
  • template_id (integer): The unique id that identifes a template.
NOTE: If you use either the template hash or id, you can skip search_params, as they are automatically inferred from the template. OR
  • search_params (string): A query string that specifies the hardware and performance criteria for filtering GPU offers in the vast.ai marketplace.
  • launch_args (string): A command-line style string containing additional parameters for instance creation that will be parsed and applied when the autoscaler creates new workers. This allows you to customize instance configuration beyond what’s specified in templates.
Optional (Default values will be assigned if not specified):
  • min_load(integer): A minimum baseline load (measured in tokens/second for LLMs) that the serverless engine will assume your Endpoint needs to handle, regardless of actual measured traffic. Default value is 1.0.
  • target_util (float): A ratio that determines how much spare capacity (headroom) the serverless engine maintains. Default value is 0.9.
  • cold_mult(float): A multiplier applied to your target capacity for longer-term planning (1+ hours). This parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs. Default value is 3.0.
  • test_workers (integer): The number of different physical machines that a Workergroup should test during its initial “exploration” phase to gather performance data before transitioning to normal demand-based scaling. Default value is 3.
  • gpu_ram (integer): The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs. Default value is 24.
JSON
{
    "api_key": "YOUR_VAST_API_KEY",
    "endpoint_name": "YOUR_ENDPOINT_NAME",
    "template_hash": "YOUR_TEMPLATE_HASH",
    "min_load": 1.0,
    "target_util": 0.9,
    "cold_mult": 3.0,
    "test_workers": 3,
    "gpu_ram": 24
}

Outputs

On Successful Worker Return

  • success(bool): True on successful creation of Workergroup, False if otherwise.
  • result(int): The autogroup_id of the newly created Workergroup.
JSON
{
  "success": true,
  "result": 789
}

On Failure to Find Ready Worker

  • success(bool): True on successful creation of Workergroup, False if otherwise.
  • error(string): The type of error status.
  • msg (string): The error message related to the error.
JSON
{
  "success": false,
  "error": "auth_error",
  "msg": "Invalid user key"
}

Example: Creating a Workergroup with cURL

Curl
curl --location 'https://console.vast.ai/api/v0/workergroups/' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer MY_ENDPOINT_KEY' \
--data '{
  "endpoint_name": "MY_ENDPOINT",
  "template_hash": "MY_TEMPLATE_HASH",
  "min_load": 10,
  "target_util": 0.9,
  "cold_mult": 3.0,
  "test_workers": 3,
  "gpu_ram": 24
}'

Example: Creating an Endpoint with the Vast CLI

Terminal
vastai create workergroup --endpoint_name "MY_ENDPOINT" --template_hash "MY_TEMPLATE_HASH" --min_load 10 --target_util 0.9 --cold_mult 3.0 --test_workers 3 --gpu_ram 24