Serverless
Endpoints
Create Endpoints and Workergroups
18 min
the /endptjobs/ and /workergroups/ endpoints calls on the webserver to create a new {{endpoint}} and {{workergroup}} post https //console vast ai/api/v0/endptjobs/ inputs api key (string) the vast api key associated with the account that controls the endpoint the key can also be placed in the header as an authorization bearer endpoint name (string) the name given to the endpoint that is created min load (integer) a minimum baseline load (measured in tokens/second for llms) that the serverless engine will assume your endpoint needs to handle, regardless of actual measured traffic target util (float) a ratio that d etermines how much spare capacity (headroom) the serverless engine maintains cold mult (float) a multiplier applied to your target capacity for longer term planning (1+ hours) this parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs cold workers (integer) the minimum number of workers that must be kept in a "ready quick" state before the serverless engine is allowed to destroy any workers max workers (integer) a hard upper limit on the total number of worker instances (ready, stopped, loading, etc ) that your endpoint can have at any given time { "api key" "your vast api key", "endpoint name" "your endpoint name", "min load" 10, "target util" 0 9, "cold mult" 2 0, "cold workers" 5, "max workers" 20 } outputs on successful worker return success (bool) true on successful creation of endpoint, false if otherwise result (int) the endpoint id of the newly created endpoint { "success" true, "result" 1234 } on failure to find ready worker success (bool) true on successful creation of endpoint, false if otherwise error (string) the type of error status msg (string) the error message related to the error { "success" false, "error" "auth error", "msg" "invalid user key" } example creating an endpoint with curl curl location 'https //console vast ai/api/v0/endptjobs/' \\ \ header 'accept application/json' \\ \ header 'content type application/json' \\ \ header 'authorization bearer my vast token here' \\ \ data '{ "min load" 10, "target util" 0 9, "cold mult" 2 5, "cold workers" 5, "max workers" 20, "endpoint name" "my endpoint" }' example creating an endpoint with the vast cli terminal vastai create endpoint min load 10 target util 0 9 cold mult 2 0 cold workers 5 max workers 20 endpoint name "my endpoint" post https //console vast ai/api/v0/workergroups/ inputs required api key (string) the vast api key associated with the account that controls the endpoint the key can also be placed in the header as an authorization bearer endpoint name (string) the name of the endpoint that the workergroup will be created under and one of the following template hash (string) the hexadecimal string that identifies a particular template or template id (integer) the unique id that identifes a template note if you use either the template hash or id, you can skip search params , as they are automatically inferred from the template or search params (string) a query string that specifies the hardware and performance criteria for filtering gpu offers in the vast ai marketplace launch args (string) a command line style string containing additional parameters for instance creation that will be parsed and applied when the autoscaler creates new workers this allows you to customize instance configuration beyond what's specified in templates optional (default values will be assigned if not specified) min load (integer) a minimum baseline load (measured in tokens/second for llms) that the serverless engine will assume your endpoint needs to handle, regardless of actual measured traffic default value is 1 0 target util (float) a ratio that d etermines how much spare capacity (headroom) the serverless engine maintains default value is 0 9 cold mult (float) a multiplier applied to your target capacity for longer term planning (1+ hours) this parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs default value is 3 0 test workers (integer) the number of different physical machines that a workergroup should test during its initial "exploration" phase to gather performance data before transitioning to normal demand based scaling default value is 3 gpu ram (integer) the amount of gpu memory (vram) in gigabytes that your model or workload requires to run this parameter tells the serverless engine how much gpu memory your model needs default value is 24 { "api key" "your vast api key", "endpoint name" "your endpoint name", "template hash" "your template hash", "min load" 1 0, "target util" 0 9, "cold mult" 3 0, "test workers" 3, "gpu ram" 24 } outputs on successful worker return success (bool) true on successful creation of workergroup, false if otherwise result (int) the autogroup id of the newly created workergroup { "success" true, "result" 789 } on failure to find ready worker success (bool) true on successful creation of workergroup, false if otherwise error (string) the type of error status msg (string) the error message related to the error { "success" false, "error" "auth error", "msg" "invalid user key" } example creating a workergroup with curl curl location 'https //console vast ai/api/v0/workergroups/' \\ \ header 'accept application/json' \\ \ header 'content type application/json' \\ \ header 'authorization bearer my endpoint key' \\ \ data '{ "endpoint name" "my endpoint", "template hash" "my template hash", "min load" 10, "target util" 0 9, "cold mult" 3 0, "test workers" 3, "gpu ram" 24 }' example creating an endpoint with the vast cli terminal vastai create workergroup endpoint name "my endpoint" template hash "my template hash" min load 10 target util 0 9 cold mult 3 0 test workers 3 gpu ram 24