Distributed Computing

Create Your Own Multi-Node Cluster

7min

To set up a multi-node cluster on Vast, you can rent machines from the same datacenter and test their connectivity to ensure they can communicate effectively.

Alternatively, you can create a cluster with machines from different datacenters or even across the globe, though this will typically increase latency. Generally, latency ranks from lowest to highest across these setups:

Text


Each instance or container in Vast is allocated one GPU, so it’s possible to run several instances/containers on a single machine if it has multiple GPUs.

For example, you could deploy five instances on a machine equipped with 5 x RTX 4090 GPUs.

Suppose scale is more important to you and you can tolerate higher latency, it would most likely be better to try to find GPU rigs within the same datacenter, in different datacenters, or across the globe and try to connect them.

For example, suppose you require something like 100 RTX 4090s, you could try to rent six 12 x RTX 4090 rigs and four 10 x RTX 4090 rigs to get the GPU power you want.

Rigs To Get Scale
Rigs To Get Scale


Step 1 - Look for Machines with Same Datacenter ID

Machines that belong to a datacenter will have typically have the blue datacenter label on them. You can find the datacenter's ID next to "datacenter:".

Diff Machines Same Dc
Diff Machines Same Dc


In this case, these two machines belong to datacenter 3497. The top instance will run on machine 11877 and the bottom instance will run on machine 13280 within datacenter 3497.

Step 2 - Find Public IP Addresses of Machines

Click on the blue button with ip range to find each instance's public ip address once the blue button says "OPEN".

Ip Port Diff Machine
Ip Port Diff Machine


Find public ip address in pop up.

Public Ip Diff Machine
Public Ip Diff Machine


Step 3 - Ping IP Addresses of Other Machines to Test Connectivity

Use this command to download ping utility inside a jupyter terminal or your own terminal once you have connected to the instance using ssh.

Text


Ping the public ip address of another rented machine like in this example:

Ping Diff Machine
Ping Diff Machine


If you see a response like this, that means your ping was successful and you were able to connect to the machine.

If you don't see a response like this, try machines from a different datacenter. A ping can fail for reasons such as a firewall or a corporate network blocking the ping packets.

Step 4 - [Optional] Hit Server Running at Port on Another Machine in Your Cluster

Make sure to expose the ports your servers will run on by editing your template and adding the port you want exposed before you rent your instance.

Expose Port 8081
Expose Port 8081


You can find the public ip mapped to the port you exposed in the Open Ports section in a pop up like this:

Public Ip Port 8081
Public Ip Port 8081


In this case, 185.62.108.226:41525 is the public ip mapped to port 8081 in the instance/container.

You can use curl to hit 185.62.108.226:41525 and the traffic is forwarded to port 8081 inside the instance/container.

Curl Public Ip Port
Curl Public Ip Port




Updated 18 Dec 2024
Doc contributor
Did this page help you?