Serverless
Pricing
3 min
vast ai serverless offers pay as you go pricing for all workloads at the same rates as vast ai's non serverless gpu instances this guide explains how pricing works gpu recruitment as the serverless engine takes requests, it will automatically scale its number of workers up or down depending on the incoming and forecasted demand when scaling up, the engine searches over the vast ai marketplace for gpu instances that offer the best performance / price ratio once determined, the gpu instance(s) is recruited into the serverless engine, and its cost ($/hr) is added to the running sum of all gpu instances running on your serverless engine as the request demand falls off, the engine will remove gpu instance(s) and your credit account immediatley stops being charged for those corresponding instance(s) visit the billing help page to see details on gpu instance costs suspending an endpoint by suspending an endpoint, the endpoint will no longer recruit any new gpu instances, but will continue to use the instances it currently has this is a way to cap the number of instances an endpoint can manage, and therefore limit costs stopping an endpoint stopping an endpoint will pause the recruitment of gpu instances, and put the existing instances into the "stopped" state, preventing any work from being sent to the endpoint group the instances will still charge the small storage cost, but active rental and bandwidth costs will not be charged to the user account