Skip to main content
Run autonomous coding agents all night long for under $18 and wake up to a completed project with passing tests.

Overview

Ralph is an agentic loop that implements a project from a PRD. It picks a user story, writes the code, runs tests, and moves to the next story — repeating until everything passes. By running on Vast.ai with an open-source model, you get autonomous development without API costs. In this guide, we’ll start with a simple calculator example to see Ralph in action. Once that works, you can scale up to complex projects that run overnight.

Model: Qwen3-Coder-Next-FP8

AttributeValue
ModelQwen/Qwen3-Coder-Next-FP8
Size80B params (3B active, MoE), ~80GB in FP8
GPUs4x RTX 4090 (96GB total)
Cost~$1.50/hr
Imagelmsysorg/sglang:latest (v0.5.8+)
CUDA12.9+
Why Qwen3-Coder-Next?
  • Trained specifically for agentic coding tools (aider, Claude Code, Cline, etc.)
  • 256K context length

Prerequisites

  • Vast.ai account with API key (Sign up here)
  • Python 3.10 or later
  • git, jq, curl, openssl

Setup

Bash
# Create a virtual environment
python3 -m venv ralph-env
source ralph-env/bin/activate

# Install Vast CLI and Aider
pip install --upgrade vastai aider-chat pytest
vastai set api-key <your-vast-api-key>

# Clone Ralph
git clone https://github.com/snarktank/ralph.git
cd ralph

Step 1: Deploy Qwen3-Coder-Next on Vast

Find a 4x RTX 4090 instance with CUDA 12.9+:
Bash
vastai search offers 'gpu_name=RTX_4090 num_gpus=4 dph<2.5 reliability>0.98 inet_down>1000 cuda_vers>=12.9 direct_port_count>=1' -o 'dph'
Generate a bearer token for the inference endpoint and deploy (replace <OFFER_ID> with an ID from the first column):
Bash
# Generate a bearer token for your inference endpoint
MODEL_API_KEY=$(openssl rand -hex 16)
echo "$MODEL_API_KEY" > .vast_model_api_key
echo "Endpoint bearer token: $MODEL_API_KEY (saved to .vast_model_api_key)"

# Deploy with SGLang
vastai create instance <OFFER_ID> \
    --image lmsysorg/sglang:latest \
    --env '-p 8000:8000' \
    --disk 200 \
    --onstart-cmd "python3 -m sglang.launch_server \
        --model-path Qwen/Qwen3-Coder-Next-FP8 \
        --host 0.0.0.0 \
        --port 8000 \
        --tp-size 4 \
        --context-length 32768 \
        --mem-fraction-static 0.85 \
        --api-key $MODEL_API_KEY"
This guide uses three different keys:
  • Vast account API key — authenticates the Vast CLI (vastai set api-key)
  • Endpoint bearer token (MODEL_API_KEY) — secures your SGLang inference endpoint
  • Client SDK key (OPENAI_API_KEY) — set to the same value as the endpoint bearer token so Aider’s OpenAI-compatible client can authenticate

Step 2: Get Your Endpoint

Wait 10-15 minutes for the model weights (~80GB) to download and load. You can monitor progress with vastai logs <INSTANCE_ID> — look for “The server is fired up and ready to roll!” Then get your endpoint:
Bash
vastai show instance <INSTANCE_ID> --raw | jq -r '"\(.public_ipaddr):\(.ports["8000/tcp"][0].HostPort)"'
# Output: <IP>:<PORT>
Verify it’s ready (SGLang returns HTTP 200 with an empty body — that’s normal):
Bash
curl -w '\nHTTP Status: %{http_code}\n' -H "Authorization: Bearer $MODEL_API_KEY" http://<IP>:<PORT>/health

Step 3: Configure Aider for Vast

Set environment variables to point Aider at your Vast endpoint. OPENAI_API_KEY must be set to the same endpoint bearer token you generated in Step 1:
Bash
export OPENAI_API_BASE="http://<IP>:<PORT>/v1"
export OPENAI_API_KEY="$MODEL_API_KEY"

Step 4: Verify Aider Connectivity

Test that Aider can reach your Vast endpoint:
Bash
aider --model openai/Qwen/Qwen3-Coder-Next-FP8 --no-git --yes-always --no-show-model-warnings --message "Say hello"
You should see Aider respond. If you get connection errors, verify the endpoint URL and that the model finished loading (check vastai logs <INSTANCE_ID>).

Step 5: Add Aider Support to Ralph

Ralph doesn’t include aider as a tool out of the box. You need to make two edits to ralph.sh: Edit 1: Add aider to the tool validation. Find the line that validates the --tool argument:
Bash
if [[ "$TOOL" != "amp" && "$TOOL" != "claude" ]]; then
  echo "Error: Invalid tool '$TOOL'. Must be 'amp' or 'claude'."
Add aider as a valid option:
Bash
if [[ "$TOOL" != "amp" && "$TOOL" != "claude" && "$TOOL" != "aider" ]]; then
  echo "Error: Invalid tool '$TOOL'. Must be 'amp', 'claude', or 'aider'."
Edit 2: Add the aider tool block. Find the elif chain that runs each tool (look for the claude block). After the last elif block and before the closing fi, add:
Bash
  elif [[ "$TOOL" == "aider" ]]; then
    # Aider: use --message flag for non-interactive mode
    PROMPT_CONTENT=$(cat "$SCRIPT_DIR/prompt.md")
    OUTPUT=$(aider --model openai/Qwen/Qwen3-Coder-Next-FP8 \
      --yes-always --no-git --no-show-model-warnings --no-browser \
      --file "$SCRIPT_DIR/prd.json" \
      --message "$PROMPT_CONTENT" 2>&1) || true
    echo "$OUTPUT"
This tells aider to use the Vast-hosted Qwen3-Coder-Next model (via the OPENAI_API_BASE env var you set in Step 3), load the PRD file for context, and run non-interactively with the Ralph prompt.

Step 6: Run Ralph

Create a prd.json that defines what you want Ralph to build. Note that testCommand is informational — the agent reads it from the PRD to know how to run tests, but ralph.sh itself doesn’t execute it.
JSON
{
  "project": "Calculator",
  "branchName": "ralph/calculator",
  "description": "Create a Python calculator module with basic arithmetic functions",
  "testCommand": "python -m pytest test_calculator.py -v",
  "userStories": [
    {
      "id": "US-001",
      "title": "Create add function",
      "description": "Create calculator.py with an add function.",
      "acceptanceCriteria": [
        "add(2, 3) returns 5",
        "add(-1, 1) returns 0"
      ],
      "priority": 1,
      "passes": false
    },
    {
      "id": "US-002",
      "title": "Create multiply function",
      "description": "Add a multiply function to calculator.py.",
      "acceptanceCriteria": [
        "multiply(2, 3) returns 6",
        "multiply(-1, 5) returns -5",
        "multiply(0, 100) returns 0"
      ],
      "priority": 2,
      "passes": false
    },
    {
      "id": "US-003",
      "title": "Create divide function",
      "description": "Add a divide function with zero handling.",
      "acceptanceCriteria": [
        "divide(10, 2) returns 5",
        "divide(-6, 3) returns -2",
        "divide(1, 0) raises ZeroDivisionError"
      ],
      "priority": 3,
      "passes": false
    }
  ]
}
Run Ralph:
Bash
OPENAI_API_BASE="http://<IP>:<PORT>/v1" \
OPENAI_API_KEY="$MODEL_API_KEY" \
./ralph.sh --tool aider 5
Ralph creates calculator.py and test_calculator.py from scratch, implementing each user story and running tests until they pass. Example output (calculator.py):
Python
def add(a, b):
    return a + b

def multiply(a, b):
    return a * b

def divide(a, b):
    if b == 0:
        raise ZeroDivisionError("Cannot divide by zero")
    return a / b
Example output (test_calculator.py):
Python
import pytest
from calculator import add, multiply, divide

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0

def test_multiply():
    assert multiply(2, 3) == 6
    assert multiply(-1, 5) == -5
    assert multiply(0, 100) == 0

def test_divide():
    assert divide(10, 2) == 5
    assert divide(-6, 3) == -2
    with pytest.raises(ZeroDivisionError):
        divide(1, 0)

Cleanup

Destroy the instance when done:
Bash
vastai destroy instance <INSTANCE_ID>

Next Steps: Overnight Ralph Loop

Project ideas for overnight runs:
  • Full CLI application with subcommands, config files, and help system
  • REST API with authentication, validation, and multiple resource types
  • Web scraper with multiple site adapters, rate limiting, and data export
  • Complete test suite for an existing codebase (one test file per module)
  • Database migration system with schema versioning and rollback
To run Ralph unattended overnight:
Bash
# Run in background with nohup, increase iterations
nohup bash -c 'OPENAI_API_BASE="http://<IP>:<PORT>/v1" OPENAI_API_KEY="$MODEL_API_KEY" ./ralph.sh --tool aider 500' > ralph.log 2>&1 &

# Check progress
tail -f ralph.log

# Check generated files
ls -la *.py

# Run tests manually
python -m pytest test_*.py -v
Cost estimate: At ~1.50/hr,an812hourovernightruncosts1.50/hr, an 8-12 hour overnight run costs 12-18. Tips:
  • Use tmux or screen instead of nohup if you want to reattach later
  • Monitor with vastai show instance <ID> to ensure the instance stays running
  • Check progress.txt for Ralph’s learnings across iterations
  • Commit your prd.json before starting so you can reset if needed
  • Remember to vastai destroy instance <INSTANCE_ID> when the run finishes — instances bill by the hour even when idle

Resources