Autoscaler

Templates Reference

6min
below are template hash values for various autoscaler compatible templates these templates handle all the necessary configuration to run different images in api mode, search for suitable machines, and communicate successfully with the autoscaling server refer to the quickstart https //docs vast ai/serverless/getting started for a guide on using these templates customizing your own templates for all templates, some variables such as hf token must be set you can create a new private template by modifying relevant variables in the create template command then, set your autogroup to use that template by specifying the new template hash of your newly created template if the existing templates don't meet your needs, don't hesitate to contact us, and we’ll assist you in creating an autoscaler compatible template tgi the text generation inference https //github com/huggingface/text generation inference image is used for serving llms required variables hf token your hugging face api token with read permissions, used to download gated models model id id of model you want to use for inference see here https //huggingface co/docs/text generation inference/en/supported models for available options endpoints /generate generates the llm's response to a given prompt in a single request /generate stream streams the llm's response token by token both endpoints take the same api payload { "inputs" "$prompt", "parameters" { "max new tokens" 250 } } note the max new tokens parameter, rather than the prompt size, impacts performance for example, if an instance is benchmarked to process 100 tokens per second, a request with max new tokens = 200 will take approximately 2 seconds to complete template hash dcd0920ffd9d026b7bb2d42f0d7479ba https //cloud vast ai/?template id=dcd0920ffd9d026b7bb2d42f0d7479ba comfy ui the ai dock's comfy ui https //github com/comfyanonymous/comfyui template is used as the base image for running the pyworker server for image generation required variables hf token your hugging face api token with read permissions, used to download gated models comfy model text2image model to be used, options sd3 , flux template hash f8aac491656a1d1040e7dfc5f8fcf059 https //cloud vast ai/?template id=f8aac491656a1d1040e7dfc5f8fcf059 endpoints /prompt uses the default comfy workflow defined under workers/comfyui/misc/default workflows https //github com/vast ai/pyworker/blob/main/workers/comfyui/misc/default workflows payload example /custom workflow allows the client to send their own comfy workflow with each api request payload example