Running DR-Tulu on Vast.ai: A Complete Guide
DR-Tulu (Deep Research Tulu) is an open-source research agent developed by AI2 (Allen Institute for AI). Unlike standard LLMs that are fine-tuned separately from their tools, DR-Tulu was trained end-to-end with an MCP server providing web search and page reading capabilities. This means the model learned to use these tools as part of its reasoning process, not as an afterthought. It autonomously plans research strategies, searches the web, and synthesizes information from multiple sources into comprehensive, cited answers. This guide covers deploying DR-Tulu-8B on Vast.ai with the complete agent stack required for production use.Prerequisites
Before getting started, you’ll need:- A Vast.ai account with credits (Sign up here)
- Vast.ai CLI installed (
pip install vastai) - Your Vast.ai API key configured
- API keys for external services (details below)
Architecture
DR-Tulu requires three components working together:- vLLM Server: Serves the DR-Tulu-8B model via an OpenAI-compatible API
- MCP Backend: Provides web search and page reading capabilities via the Model Context Protocol
- dr-agent Library: Orchestrates the multi-turn interaction between the model and tools
External Service Dependencies
DR-Tulu requires API keys for the following services:| Service | Purpose | Required |
|---|---|---|
| Serper | Web search | Yes |
| Jina | Page content extraction | Yes |
| Semantic Scholar | Academic paper search | Optional |
Instance Configuration
Step 1: Search for Suitable Instances
Use the Vast.ai CLI to find instances that meet the requirements:- Single GPU with at least 24GB VRAM
- 100GB disk space
- Verified hosts only
- Sorted by cost (lowest first)
Step 2: Deploy the vLLM Server
Once you’ve selected an instance ID from the search results, create it with the correct configuration:vllm/vllm-openai:v0.10.0. The latest tag (v0.12.0) has a Triton compilation bug that crashes after model load. Older versions don’t support the Qwen3 architecture.
Setting Up the MCP Backend
The MCP backend provides the tools DR-Tulu needs to search the web and read page content. This deployment runs the MCP backend on your local machine while vLLM runs on Vast.ai. The dr-agent library coordinates between them.Using DR-Tulu
Interactive Chat
For interactive use, launch the chat interface:--skip-checks flag prevents the launcher from trying to start a local vLLM server (since it’s on Vast.ai). The --mcp-port 8001 must match the port used when starting the MCP backend. Type your questions and the agent will search the web, read pages, and synthesize an answer with citations.
Batch Evaluation
Run DR-Tulu against built-in evaluation datasets:simpleqa, healthbench, deep_research_bench, research_qa, genetic_diseases, 2wiki, webwalker
Example output (results.jsonl):
Python API
Call DR-Tulu directly from your code:Additional Resources
- DR-Tulu Blog Post - Overview and research methodology
- Model Weights - Hugging Face model card
- Source Code - dr-agent library and evaluation code
- Research Paper - Technical details and benchmark results