Templates Reference
Below are template_hash values for various autoscaler-compatible templates. These templates handle all the necessary configuration to run different images in API mode, search for suitable machines, and communicate successfully with the autoscaling server. Refer to the quickstart for a guide on using these templates.
Customizing Your Own Templates: For all templates, some variables such as HF_TOKEN must be set. You can create a new private template by modifying relevant variables in the create template command. Then, set your autogroup to use that template by specifying the new template_hash of your newly created template.
If the existing templates don't meet your needs, don't hesitate to contact us, and we’ll assist you in creating an autoscaler-compatible template.
The text-generation-inference image is used for serving LLMs.
- HF_TOKEN: your Hugging Face API token with read permissions, used to download gated models
- /generate: Generates the LLM's response to a given prompt in a single request.
- /generate_stream: Streams the LLM's response token by token.
Both endpoint take the same API payload:
Note: The max_new_tokens parameter, rather than the prompt size, impacts performance. For example, if an instance is benchmarked to process 100 tokens per second, a request with max_new_tokens = 200 will take approximately 2 seconds to complete.
template_hash: dcd0920ffd9d026b7bb2d42f0d7479ba
The ai-dock's Comfy UI template is used as the base image for running the pyworker server for image generation.
- HF_TOKEN: your Hugging Face API token with read permissions, used to download gated models
- COMFY_MODEL: Text2Image model to be used, options: sd3, flux
template_hash: f8aac491656a1d1040e7dfc5f8fcf059
- payload example:
- /custom_workflow: Allows the client to send their own comfy workflow with each API request.
- payload example: