PyWorker
Overview
7min
the vast pyworker is a python web server designed to run alongside a machine learning model instance, providing autoscaler compatibility it serves as the primary entry point for api requests, forwarding them to the model's api hosted on the same instance additionally, it monitors performance metrics and estimates current workload, reporting these metrics to the autoscaler pyworker diagram in the diagram's example, a user's client is attempting to infer from a machine learning model with vast's autoscaler setup, the client sends a /route post request to the autoscaler this asks the autoscaler for a gpu instance to send the inference request the autoscaler selects a ready and available worker instance from the client's endpoint group and replies with a json object containing the url of the selected instance the client then constructs a new post request with it's payload, authentication data, and the url of the worker instance the pyworker running on that specific instance validates the request and extracts the payload it then sends the payload to the actual model inference server, which runs on the same instance as the pyworker the model generates it's output and returns the result to the pyworker the pyworker formats the model's response as needed, and sends the response back to the client independently and concurrently, the pyworker periodically sends it's operational metrics to the autoscaler, which it uses to make scaling decisions all of vast's autoscaler templates use the vast pyworker if you are using a recommended autoscaler template from vast, the pyworker is already integrated with the template and will automatically startup when an autogroup is created the vast pyworker repository https //github com/vast ai/pyworker/ allows you to create a custom pyworker for your custom template and integrate with vast’s autoscaling server even with a custom pyworker, the pyworker code runs on your vast instance, and we automate its installation and activation during instance creation integration with model instance the vast pyworker wraps the application specific backend code of the model instance you are running the pyworker calls the appropriate backend function when the pyworker's corresponding api endpoint is invoked for example, if you are running a text generation inference (tgi) server, your pyworker might receive the json body json { "auth data" { "signature" "a base64 encoded signature string from route endpoint", "cost" 256, "endpoint" "your tgi endpoint group name", "reqnum" 1234567890, "url" "https //worker ip address vast ai\ port" }, "payload" { "inputs" "what is the answer to the universe?", "parameters" { "max new tokens" 256, "temperature" 0 7, "top p" 0 9, "do sample" true } } } when it receives this request, your pyworker will internally send the following to the tgi model sever json { "inputs" "what is the answer to the universe?", "parameters" { "max new tokens" 256, "temperature" 0 7, "top p" 0 9, "do sample" true } } your pyworker would similarily receive the output result from the tgi server, and forward a formatted version to the client to use the pyworker with a specific backend use a launch script that starts the pyworker code install required dependencies for the backend code set up any additional requirements for your backend to run communication with autoscaler to integrate with vast's autoscaling service, each backend must send a message to the autoscaling server when the backend server is ready (e g , after model installation) periodically send performance metrics to the autoscaling server to optimize server usage and performance report any errors to the autoscaling server getting started if you want to create your own backend and learn how to integrate with the autoscaling server, please refer to the following guides extension guide supported backends vast has pre created backends for popular models such as text generation inference https //github com/huggingface/text generation inference and comfy ui https //github com/comfyanonymous/comfyui these backends allow you to use these models in api mode, automatically handling performance and error tracking, making them compatible with vast's autoscaler with no additional code required to get started with vast supported backends, see the pyworker backends guide https //docs vast ai/serverless/backends for more detailed information and advanced configuration, please visit the vast pyworker repository https //github com/vast ai/pyworker/