Set Up Your Account
- Setup your Vast account and add credit: Review the quickstart guide to get familar with the service if you do not have an account with credits loaded.
Configure the vLLM Template
vLLM serve is launched automatically by the template and it will use the configuration defined in the environment variablesVLLM_MODEL
and VLLM_ARGS
. Here’s how you can set it up
- Vist the templates page and find the recommended vLLM template.
- Click the pencil button to open up the template editor.
- If you would like to run a model other than the default, edit the
VLLM_MODEL
environment variable. The default value isdeepseek-ai/DeepSeek-R1-Distill-Llama-8B
which is a HuggingFace repository. - You can also set the arguments to pass to
vllm serve
by modifying theVLLM_ARGS
environment variable. vLLM is highly configurable so it’s a good idea to check the official documentation before changing anything here. All available startup arguments are listed in the official vLLM documentation. - Save the template. You will be able to find the version you have just modified in the templates page in the ‘My Templates’ section.
Launch Your Instance
- Select the template you just saved from the ‘My Templates’ section of the templates page.
- Click the Play icon on this template to be taken to view the available offers.
- Use the search filters to select a suitable GPU, ensuring that you have sufficient VRAM to load all of the model’s layers to GPU.
- From the search menu, ensure you have sufficient disk space for the model you plan to run. The disk slider is located under the template icon on the left hand column. Large models (e.g., 70B parameters) can require dozens of gigabytes of storage. For Deep Seek R1 8B, make sure to allocate over 17Gb of disk space using the slider.
- Click Rent on a suitable instance and wait for it to load
vLLM API Usage
The vLLM API can be accessed programmatically at:Bash
Authentication Token
- When making requests, you must include an Authorization header with the token value of OPEN_BUTTON_TOKEN.
Sample Curl Command
Bash
- -k: Allows curl to perform insecure SSL connections and transfers as Vast.ai uses a self-signed certificate.
- Replace INSTANCE_IP and EXTERNAL_PORT with the externally mapped port for 8000 from the IP button on the instance.
- Update the Authorization header value to match your OPEN_BUTTON_TOKEN. You can get that from any of the links in the Instance Portal or from the Open button on the instance card.
- Modify the prompt, model, and other fields (max_tokens, temperature, etc.) as needed.
vLLM with Python
Although the instance starts the vllm serve function to provide an inference API, the template has been configured with Jupyter and SSH access so you can also interact with vLLM in code from your instance. To do this simply include the vllm modules at the top of your Python script:Python