Autoresearch is Andrej Karpathy’s framework for autonomous AI-driven ML research. The idea is simple: point an AI agent (Claude Code) at a small but real LLM training setup and let it experiment autonomously overnight. The agent modifies the model code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats, running ~12 experiments per hour, ~100 overnight.This guide walks you through setting up autoresearch on a Vast.ai GPU instance with Claude Code as the autonomous research agent.
Autoresearch requires a single NVIDIA GPU with 80GB VRAM (H100 or A100 80GB). It needs CUDA 12.8+ and about 50GB of disk for the repo, data, and dependencies.
Use Template
Manual Setup
Use the Autoresearcher template to launch a pre-configured instance with uv, Claude Code, and autoresearch already installed.
Learn more about templates
Templates are reusable configurations that bundle a Docker image, environment variables, and startup scripts into a one-click launch.
The template installs everything on first boot (~10 minutes). You can monitor progress with tail -f /var/log/provisioning.log.
The template automatically configures Claude Code permissions (Read, Edit, Write, Bash) in .claude/settings.json so it can run experiments without prompting, no manual setup needed.Once provisioning completes, skip ahead to Launch Autonomous Research.
Vast instances start in a tmux session by default. This keeps your processes running if your SSH connection drops, essential for overnight research runs.
Claude Code normally asks for permission before running commands or editing files. For autonomous overnight research, you need to pre-approve the tools Claude will use. Create a settings file in the autoresearch directory:
When Claude Code starts, log in to your Anthropic account:
/login
This will give you a URL to open in your browser. Follow the prompts to authenticate, then you’re ready to go.Kick off the research loop:
Hi have a look at program.md and let's kick off a new experiment! let's do the setup first.
Claude will:
Read program.md for the research guidelines
Create a fresh git branch (e.g. autoresearch/mar10)
Run the baseline experiment
Begin the autonomous loop, modifying train.py, training for 5 minutes, evaluating, keeping improvements, discarding regressions
Log all results to results.tsv
Claude runs indefinitely until manually stopped. Each experiment takes ~5 minutes, so you can expect ~12 experiments/hour and ~100 experiments overnight. Each iteration also uses Claude API tokens.
Claude has full freedom to edit train.py, the model architecture, optimizer, hyperparameters, batch size, model size, training loop. The only constraints are:
prepare.py is read-only, the evaluation harness and data loading are fixed
No new packages, only dependencies in pyproject.toml
5-minute time budget, every experiment runs for exactly 5 minutes
When you’re done, download your results and destroy the instance:
# From your local machine — copy resultsscp -P PORT root@HOST_IP:/workspace/autoresearch/results.tsv ./results.tsv# Destroy the instancevastai destroy instance INSTANCE_ID
Destroying an instance permanently deletes all data on it. Make sure to copy any results you want to keep before destroying.