Skip to main content

Running DR-Tulu on Vast.ai: A Complete Guide

DR-Tulu (Deep Research Tulu) is an open-source research agent developed by AI2 (Allen Institute for AI). Unlike standard LLMs that are fine-tuned separately from their tools, DR-Tulu was trained end-to-end with an MCP server providing web search and page reading capabilities. This means the model learned to use these tools as part of its reasoning process, not as an afterthought. It autonomously plans research strategies, searches the web, and synthesizes information from multiple sources into comprehensive, cited answers. This guide covers deploying DR-Tulu-8B on Vast.ai with the complete agent stack required for production use.

Prerequisites

Before getting started, you’ll need:
  • A Vast.ai account with credits (Sign up here)
  • Vast.ai CLI installed (pip install vastai)
  • Your Vast.ai API key configured
  • API keys for external services (details below)

Architecture

DR-Tulu requires three components working together:
  1. vLLM Server: Serves the DR-Tulu-8B model via an OpenAI-compatible API
  2. MCP Backend: Provides web search and page reading capabilities via the Model Context Protocol
  3. dr-agent Library: Orchestrates the multi-turn interaction between the model and tools
The model generates tool calls in a specific format. The dr-agent library parses these calls, executes them through the MCP backend, and feeds results back to the model. This loop continues until the model produces a final answer.

External Service Dependencies

DR-Tulu requires API keys for the following services:
ServicePurposeRequired
SerperWeb searchYes
JinaPage content extractionYes
Semantic ScholarAcademic paper searchOptional

Instance Configuration

Step 1: Search for Suitable Instances

Use the Vast.ai CLI to find instances that meet the requirements:
vastai search offers "gpu_ram >= 24 num_gpus = 1 disk_space >= 100 verified=true" --order "dph_base"
This searches for:
  • Single GPU with at least 24GB VRAM
  • 100GB disk space
  • Verified hosts only
  • Sorted by cost (lowest first)

Step 2: Deploy the vLLM Server

Once you’ve selected an instance ID from the search results, create it with the correct configuration:
vastai create instance <INSTANCE_ID> \
    --image vllm/vllm-openai:v0.10.0 \
    --env '-p 8000:8000 --ipc=host' \
    --disk 100 \
    --args --model rl-research/DR-Tulu-8B --dtype auto --max-model-len 16384 --port 8000
Use vllm/vllm-openai:v0.10.0. The latest tag (v0.12.0) has a Triton compilation bug that crashes after model load. Older versions don’t support the Qwen3 architecture.

Setting Up the MCP Backend

The MCP backend provides the tools DR-Tulu needs to search the web and read page content. This deployment runs the MCP backend on your local machine while vLLM runs on Vast.ai. The dr-agent library coordinates between them.
# Clone the repository
git clone https://github.com/rlresearch/dr-tulu.git
cd dr-tulu/agent

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies (uv sync handles venv creation automatically)
uv sync

# Configure API keys
export SERPER_API_KEY=your_serper_key
export JINA_API_KEY=your_jina_key

# Start the MCP backend
uv run python -m dr_agent.mcp_backend.main --port 8001
Update the workflow configuration to point to your Vast.ai instance:
# workflows/auto_search_sft.yaml
search_agent_base_url: "http://<VAST_IP>:<PORT>/v1"
search_agent_model_name: "rl-research/DR-Tulu-8B"
search_agent_tokenizer_name: "Qwen/Qwen3-8B"
search_agent_api_key: "dummy-key"
search_agent_max_tokens: 16000
search_agent_temperature: 1.0
search_agent_max_tool_calls: 10

Using DR-Tulu

Interactive Chat

For interactive use, launch the chat interface:
cd dr-tulu/agent
uv run python scripts/launch_chat.py \
    -c workflows/auto_search_sft.yaml \
    --skip-checks \
    --mcp-port 8001
The --skip-checks flag prevents the launcher from trying to start a local vLLM server (since it’s on Vast.ai). The --mcp-port 8001 must match the port used when starting the MCP backend. Type your questions and the agent will search the web, read pages, and synthesize an answer with citations.

Batch Evaluation

Run DR-Tulu against built-in evaluation datasets:
cd dr-tulu/agent
uv run python workflows/auto_search_sft.py generate-dataset simpleqa \
    --num-examples 2 \
    --config workflows/auto_search_sft.yaml \
    --output results.jsonl
Available datasets: simpleqa, healthbench, deep_research_bench, research_qa, genetic_diseases, 2wiki, webwalker Example output (results.jsonl):
{
  "problem": "Who received the IEEE Frank Rosenblatt Award in 2010?",
  "final_response": "Michio Sugeno"
}

Python API

Call DR-Tulu directly from your code:
import asyncio
import sys
sys.path.insert(0, "dr-tulu/agent")

from workflows.auto_search_sft import AutoReasonSearchWorkflow

async def research(question: str) -> dict:
    workflow = AutoReasonSearchWorkflow(
        configuration="dr-tulu/agent/workflows/auto_search_sft.yaml"
    )
    result = await workflow(
        problem=question,
        dataset_name="simpleqa",
        verbose=False,
    )
    ft = result.get("full_traces")
    return {
        "answer": result["final_response"],
        "tool_calls": result.get("total_tool_calls"),
        "tokens": getattr(ft, "total_tokens", 0) if ft else 0,
    }

result = asyncio.run(research("Who invented the transistor?"))
print("Answer:", result["answer"])
print("Tool calls:", result["tool_calls"])
print("Tokens:", result["tokens"])
Output:
Answer: John Bardeen, Walter H. Brattain, and William B. Shockley

Tool calls: 4
Tokens: 6241

Additional Resources

Cleanup

When you’re done, destroy the Vast.ai instance:
vastai destroy instance <INSTANCE_ID>

Conclusion

DR-Tulu represents a shift in how research agents are built. By training the model alongside its MCP tools rather than bolting them on afterward, AI2 created an 8B model with strong research capabilities. The split architecture—vLLM on Vast.ai for inference, MCP locally for tool execution—gives you the GPU power needed for the model while keeping API keys and tool orchestration on your own machine. This guide provides a working proof-of-concept ready for integration into your agentic workflow applications.