Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vast.ai/llms.txt

Use this file to discover all available pages before exploring further.

Run autonomous coding agents all night long for under $18 and wake up to a completed project with passing tests.

Overview

Ralph is an agentic loop that implements a project from a PRD. It picks a user story, writes the code, runs tests, and moves to the next story, repeating until everything passes. By running on Vast.ai with an open-source model, you get autonomous development without API costs. In this guide, we’ll start with a simple calculator example to see Ralph in action. Once that works, you can scale up to complex projects that run overnight.

Model: Qwen3-Coder-Next-FP8

AttributeValue
ModelQwen/Qwen3-Coder-Next-FP8
Size80B params (3B active, MoE), ~80GB in FP8
GPUs4x RTX 4090 (96GB total)
Cost~$1.50/hr
Imagelmsysorg/sglang:latest (v0.5.8+)
CUDA12.9+
Why Qwen3-Coder-Next?
  • Trained specifically for agentic coding tools (aider, Claude Code, Cline, etc.)
  • 256K context length

Prerequisites

  • Vast.ai account with API key (Sign up here)
  • Python 3.10 or later
  • git, jq, curl, openssl

Setup

Bash
# Create a virtual environment
python3 -m venv ralph-env
source ralph-env/bin/activate

# Install Vast CLI and Aider
pip install --upgrade vastai aider-chat pytest
vastai set api-key <your-vast-api-key>

# Clone Ralph
git clone https://github.com/snarktank/ralph.git
cd ralph

Step 1: Deploy Qwen3-Coder-Next on Vast

Find a 4x RTX 4090 instance with CUDA 12.9+:
Bash
vastai search offers 'gpu_name=RTX_4090 num_gpus=4 dph<2.5 reliability>0.98 inet_down>1000 cuda_vers>=12.9 direct_port_count>=1' -o 'dph'
Generate a bearer token for the inference endpoint and deploy (replace <OFFER_ID> with an ID from the first column):
Bash
# Generate a bearer token for your inference endpoint
MODEL_API_KEY=$(openssl rand -hex 16)
echo "$MODEL_API_KEY" > .vast_model_api_key
echo "Endpoint bearer token: $MODEL_API_KEY (saved to .vast_model_api_key)"

# Deploy with SGLang
vastai create instance <OFFER_ID> \
    --image lmsysorg/sglang:latest \
    --env '-p 8000:8000' \
    --disk 200 \
    --onstart-cmd "python3 -m sglang.launch_server \
        --model-path Qwen/Qwen3-Coder-Next-FP8 \
        --host 0.0.0.0 \
        --port 8000 \
        --tp-size 4 \
        --context-length 32768 \
        --mem-fraction-static 0.85 \
        --api-key $MODEL_API_KEY"
This guide uses three different keys:
  • Vast account API key, authenticates the Vast CLI (vastai set api-key)
  • Endpoint bearer token (MODEL_API_KEY), secures your SGLang inference endpoint
  • Client SDK key (OPENAI_API_KEY), set to the same value as the endpoint bearer token so Aider’s OpenAI-compatible client can authenticate

Step 2: Get Your Endpoint

Wait 10-15 minutes for the model weights (~80GB) to download and load. You can monitor progress with vastai logs <INSTANCE_ID>, look for “The server is fired up and ready to roll!” Then get your endpoint:
Bash
vastai show instance <INSTANCE_ID> --raw | jq -r '"\(.public_ipaddr):\(.ports["8000/tcp"][0].HostPort)"'
# Output: <IP>:<PORT>
Verify it’s ready (SGLang returns HTTP 200 with an empty body, that’s normal):
Bash
curl -w '\nHTTP Status: %{http_code}\n' -H "Authorization: Bearer $MODEL_API_KEY" http://<IP>:<PORT>/health

Step 3: Configure Aider for Vast

Set environment variables to point Aider at your Vast endpoint. OPENAI_API_KEY must be set to the same endpoint bearer token you generated in Step 1:
Bash
export OPENAI_API_BASE="http://<IP>:<PORT>/v1"
export OPENAI_API_KEY="$MODEL_API_KEY"

Step 4: Verify Aider Connectivity

Test that Aider can reach your Vast endpoint:
Bash
aider --model openai/Qwen/Qwen3-Coder-Next-FP8 --no-git --yes-always --no-show-model-warnings --message "Say hello"
You should see Aider respond. If you get connection errors, verify the endpoint URL and that the model finished loading (check vastai logs <INSTANCE_ID>).

Step 5: Add Aider Support to Ralph

Ralph doesn’t include aider as a tool out of the box. You need to make two edits to ralph.sh: Edit 1: Add aider to the tool validation. Find the line that validates the --tool argument:
Bash
if [[ "$TOOL" != "amp" && "$TOOL" != "claude" ]]; then
  echo "Error: Invalid tool '$TOOL'. Must be 'amp' or 'claude'."
Add aider as a valid option:
Bash
if [[ "$TOOL" != "amp" && "$TOOL" != "claude" && "$TOOL" != "aider" ]]; then
  echo "Error: Invalid tool '$TOOL'. Must be 'amp', 'claude', or 'aider'."
Edit 2: Add the aider tool block. Find the elif chain that runs each tool (look for the claude block). After the last elif block and before the closing fi, add:
Bash
  elif [[ "$TOOL" == "aider" ]]; then
    # Aider: use --message flag for non-interactive mode
    PROMPT_CONTENT=$(cat "$SCRIPT_DIR/prompt.md")
    OUTPUT=$(aider --model openai/Qwen/Qwen3-Coder-Next-FP8 \
      --yes-always --no-git --no-show-model-warnings --no-browser \
      --file "$SCRIPT_DIR/prd.json" \
      --message "$PROMPT_CONTENT" 2>&1) || true
    echo "$OUTPUT"
This tells aider to use the Vast-hosted Qwen3-Coder-Next model (via the OPENAI_API_BASE env var you set in Step 3), load the PRD file for context, and run non-interactively with the Ralph prompt.

Step 6: Run Ralph

Create a prd.json that defines what you want Ralph to build. Note that testCommand is informational, the agent reads it from the PRD to know how to run tests, but ralph.sh itself doesn’t execute it.
JSON
{
  "project": "Calculator",
  "branchName": "ralph/calculator",
  "description": "Create a Python calculator module with basic arithmetic functions",
  "testCommand": "python -m pytest test_calculator.py -v",
  "userStories": [
    {
      "id": "US-001",
      "title": "Create add function",
      "description": "Create calculator.py with an add function.",
      "acceptanceCriteria": [
        "add(2, 3) returns 5",
        "add(-1, 1) returns 0"
      ],
      "priority": 1,
      "passes": false
    },
    {
      "id": "US-002",
      "title": "Create multiply function",
      "description": "Add a multiply function to calculator.py.",
      "acceptanceCriteria": [
        "multiply(2, 3) returns 6",
        "multiply(-1, 5) returns -5",
        "multiply(0, 100) returns 0"
      ],
      "priority": 2,
      "passes": false
    },
    {
      "id": "US-003",
      "title": "Create divide function",
      "description": "Add a divide function with zero handling.",
      "acceptanceCriteria": [
        "divide(10, 2) returns 5",
        "divide(-6, 3) returns -2",
        "divide(1, 0) raises ZeroDivisionError"
      ],
      "priority": 3,
      "passes": false
    }
  ]
}
Run Ralph:
Bash
OPENAI_API_BASE="http://<IP>:<PORT>/v1" \
OPENAI_API_KEY="$MODEL_API_KEY" \
./ralph.sh --tool aider 5
Ralph creates calculator.py and test_calculator.py from scratch, implementing each user story and running tests until they pass. Example output (calculator.py):
Python
def add(a, b):
    return a + b

def multiply(a, b):
    return a * b

def divide(a, b):
    if b == 0:
        raise ZeroDivisionError("Cannot divide by zero")
    return a / b
Example output (test_calculator.py):
Python
import pytest
from calculator import add, multiply, divide

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0

def test_multiply():
    assert multiply(2, 3) == 6
    assert multiply(-1, 5) == -5
    assert multiply(0, 100) == 0

def test_divide():
    assert divide(10, 2) == 5
    assert divide(-6, 3) == -2
    with pytest.raises(ZeroDivisionError):
        divide(1, 0)

Cleanup

Destroy the instance when done:
Bash
vastai destroy instance <INSTANCE_ID>

Next Steps: Overnight Ralph Loop

Project ideas for overnight runs:
  • Full CLI application with subcommands, config files, and help system
  • REST API with authentication, validation, and multiple resource types
  • Web scraper with multiple site adapters, rate limiting, and data export
  • Complete test suite for an existing codebase (one test file per module)
  • Database migration system with schema versioning and rollback
To run Ralph unattended overnight:
Bash
# Run in background with nohup, increase iterations
nohup bash -c 'OPENAI_API_BASE="http://<IP>:<PORT>/v1" OPENAI_API_KEY="$MODEL_API_KEY" ./ralph.sh --tool aider 500' > ralph.log 2>&1 &

# Check progress
tail -f ralph.log

# Check generated files
ls -la *.py

# Run tests manually
python -m pytest test_*.py -v
Cost estimate: At ~1.50/hr,an812hourovernightruncosts1.50/hr, an 8-12 hour overnight run costs 12-18. Tips:
  • Use tmux or screen instead of nohup if you want to reattach later
  • Monitor with vastai show instance <ID> to ensure the instance stays running
  • Check progress.txt for Ralph’s learnings across iterations
  • Commit your prd.json before starting so you can reset if needed
  • Remember to vastai destroy instance <INSTANCE_ID> when the run finishes, instances bill by the hour even when idle

Resources