Introduction
Vast is introducing support for several features (local VPNs for client instances [in beta], network volumes [in alpha]) that depend on the host’s LAN configuration. Hosts can now register a set of machines they own sharing a LAN as a cluster to allow clients renting their machines to access local network resources. This will allow their machines to support use cases such as multi-node training via NCCL; as well, this will be a prerequisite to listing network volumes when they are released. A registered cluster is associated with:- A machine that acts as the manager node and is responsible for dispatching cluster management commands from the Vast server.
- An IP subnet that machines in the cluster use to identify which network interface / which IP addresses for communication with other machines in the cluster.
- A set of member machines.
Cluster registration guide
- Update to the newest version of the CLI: go to (https://cloud.vast.ai/cli/)[https://cloud.vast.ai/cli/] and copy+run the command starting with
wget
. - Identify and test the subnet to register:
- On the manager node ---
- Run
ip addr
orifconfig
(theip
utility is part of theiproute2
package). - Identify which interface correspond’s to their LAN. For most hosts this will be an ethernet interface, which have the naming format
enp$BUSs$SLOT[f$FUNCTION]]
in modern Ubuntu.- Hosts using Mellanox devices for their main ethernet connection may instead see their interface show up as
bond0
- Hosts using Mellanox devices for their main ethernet connection may instead see their interface show up as
- Find the IPv4 subnet corresponding to that network interface —
- In
ip addr
output, the third line for each interface usually starts withinet IPv4SUBNET
whereIPv4SUBNET
has the formatIPv4ADDRESS/MASK
whereMASK
is a non-negative integer < 32.
- In
- Run
- Test that the other machines to be added to the cluster can reach the manager node on that subnet/address.
- On the manager node:
- run
nc -l IPv4ADDRESS 2337
whereIPv4ADDRESS
is the IPv4 address component of the chosen subnet.
- run
- On each other node:
- run
nc IPv4ADDRESS 2337
- Type in some test text (i.e., “hello”) and press enter
- Check that
nc
received and outputed the test text on the manager node.
- run
- On the manager node:
- On the manager node ---
- Run
./vast.py create cluster IPv4SUBNET MACHINE_ID_OF_MANAGER_NODE
- Run
./vast.py show clusters
to check the ID of the cluster you just created. - Run
./vast.py join cluster MACHINE_IDS
whereMACHINE_IDS
is a space seperated list of the remaining machines to add to your cluster.
Cluster Management Commands Reference
./vast create cluster SUBNET MANAGER_ID
- Initializes a cluster containing the machine with ID
MANAGER_ID
as its manager node and using the network interface corresponding toSUBNET
- Initializes a cluster containing the machine with ID
./vast show clusters
- Shows clusters, for each cluster showing its
CLUSTER_ID
, associatedSUBNET
, manager node machine_id, and list of member machines.
- Shows clusters, for each cluster showing its
./vast join cluster CLUSTER_ID MACHINE_IDS
- Takes
MACHINE_IDS
as a space seperated list, and adds them to the cluster specified byCLUSTER_ID
- Takes
./vast remove-machine-from-cluster CLUSTER_ID MACHINE_ID [NEW_MANAGER_ID]
- Removes machine
MACHINE_ID
from clusterCLUSTER_ID
. If the machine is the only manager, another machine in the clusterNEW_MANAGER_ID
must be specified so that the cluster still has a manager.
- Removes machine
./vast delete cluster CLUSTER_ID
- Deletes cluster
CLUSTER_ID
. Fails if cluster resources are currently in use by client instances.
- Deletes cluster