Autoscaler
PyWorker
Performance Testing
2min
when the autoscaler recruits a gpu for a {{worker group}} , the pyworker on the gpu instance starts by conducting a performance test to assess the gpu's maximum capabilities llms for llms, this test measures the maximum tokens per second that can be generated across concurrent batches image generation for image generation, the model is generating pixels, which does not directly translate to tokens to translate pixel generation to tokens, the test counts the number of 512x512 pixel grids required to cover the image resolution, considering each grid as equivalent to 175 tokens this value is added on top of a constant overhead token value of 85 based on the number of diffusion steps performed, the value is adjusted to accomodate for the request time the value is then normalized so that a system running flux on a 4090 gpu achieves a standardized performance rating of 200 tokens per second these performance tests may take several minutes to complete, depending on the machine's specifications progress can be monitored through the instance logs once the test is completed, the results are saved if the instance is rebooted, the saved results will be loaded, and the test will not run again for more details on the full implementation, visit the vast pyworker repository https //github com/vast ai/pyworker/ and reference backend py in the lib/ folder of the pyworker