Use Cases
AI Text Generation
Ollama + Webui
6min
Below is a step-by-step guide on how to configure and run Ollama. Our template will automatically setup Open WebUI as a web based interface as well as expose a port for the Ollama API.
R1 (deepseek-r1:70b) is used as the example model in this guide. Ollama has many R1 models available to use which the webui can download. The larger the model, the more total GPU RAM and disk space you will need to allocate when renting your GPU. The models page has a drop down menu showing the model name and total GPU RAM needed to run it. You will also need at least that much disk space on the instance.
- Setup your Vast account and add credit: Review the quickstart guide to get familar with the service if you do not have an account with credits loaded.
- Select the Ollama template: click on templates and select the recomended Ollama template Open Webui (Ollama). Click on the play icon to select the template. You will then go to the search menu to find a GPU.
- Click on the Readme link at any time for a detailed guide on how to use the template.
- Disk Space: From the search menu, ensure you have sufficient disk space for the model(s) you plan to run. The disk slider is located under the template icon on the left hand column. Large models (e.g., 70B parameters) can require dozens of gigabytes of storage. For Deep Seek R1 70B, make sure to allocate 50GB of disk space using the slider.
- VRAM Requirements: Check that your GPU VRAM is sufficient for the model. Larger models require more VRAM. For Deep Seek R1 70B, we will need at least 43GB of VRAM. Find the slider titled GPU Total Ram and slide it ot 44GB.
- Example R1 (deepseek-r1:70b): We recomend a 2X 4090 instance with 50GB of disk space.
- After the instance loads, click the "Open" Button
- This will initiate the Instance Portal with links to all the services running on the instance. Click the "Open WebUI" Link.
- Create an Admin Account
- Upon first use (or if prompted), create an Admin username and password to secure your instance.
- You can add additional users in the Admin Panel
- Model Download
- Click on the admin panel -> settings
- Click on the Models tab
- Click the download icon to Manage Models
- Put in the model name to pull directly from Ollama.com. For our example that would be: deepseek-r1:70b
- Wait for the model to fully download.
- Start a New Chat
- Once the download is complete, return to the WebUI main page and start a new chat session.
- You can now test the model by sending prompts. Enjoy!
Ollama provides a direct API that you can call outside the WebUI. By default, it is available at:
https://INSTANCE_IP:PORT_11434
- When making requests, you must include an Authorization header with the token value of OPEN_BUTTON_TOKEN.
- This token is typically displayed or stored in the WebUI settings or environment variable.
curl -k https://INSTANCE_IP:EXTERNAL_PORT/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer OPEN_BUTTON_TOKEN" \
-d '{
"model": "deepseek-r1:70b",
"prompt": "San Francisco is a",
"max_tokens": 128,
"temperature": 0.6
}'
- -k: Allows curl to perform insecure SSL connections and transfers as Vast.ai uses a self-signed certificate.
- Replace INSTANCE_IP and EXTERNAL_PORT with the externally mapped port for 11434 from the IP button on the instance.
- Update the Authorization header value to match your OPEN_BUTTON_TOKEN. You can get that from any of the links in the Instance Portal or from the Open button on the instance card.
- Modify the prompt, model, and other fields (max_tokens, temperature, etc.) as needed.
Updated 30 Jan 2025
Did this page help you?