Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
application/json
Name of the endpoint group
Example:
"vLLM-Qwen3-8B"
ID of existing endpoint group (alternative to endpoint_name)
Example:
123
Hash ID of template to use for worker instances
Example:
"abc123def456"
ID of template (alternative to template_hash)
Example:
456
Search query for finding worker instances (alternative to template)
Example:
"gpu_name=RTX_3090 rentable=true"
Additional launch arguments for worker instances
Example:
"--env VAR=value"
Minimum load threshold for scaling
Example:
1
Target GPU utilization
Example:
0.9
Cold start multiplier
Example:
3
Number of cold workers to maintain
Example:
3
Maximum number of worker instances
Example:
20
Number of test workers
Example:
3
Minimum GPU RAM in GB
Example:
24