Authorizations
API key must be provided in the Authorization header
Body
Name of the endpoint group
"vLLM-Qwen3-8B"
ID of existing endpoint group (alternative to endpoint_name)
123
Hash ID of template to use for worker instances
"abc123def456"
ID of template (alternative to template_hash)
456
Search query for finding worker instances (alternative to template)
"gpu_name=RTX_3090 rentable=true"
Additional launch arguments for worker instances
"--env VAR=value"
Minimum load threshold for scaling
1
Target GPU utilization
0.9
Cold start multiplier
3
Number of cold workers to maintain
3
Maximum number of worker instances
20
Number of test workers
3
Minimum GPU RAM in GB
24