Distributed Computing

Create Your Own Multi-Node Cluster

7min
to set up a multi node cluster on vast, you can rent machines from the same datacenter and test their connectivity to ensure they can communicate effectively alternatively, you can create a cluster with machines from different datacenters or even across the globe, though this will typically increase latency generally, latency ranks from lowest to highest across these setups 1\) multiple instances on the same machine 2\) instances on different machines within the same datacenter 3\) instances on machines in different datacenters or across the world each instance or container in vast is allocated one gpu, so it’s possible to run several instances/containers on a single machine if it has multiple gpus for example, you could deploy five instances on a machine equipped with 5 x rtx 4090 gpus suppose scale is more important to you and you can tolerate higher latency, it would most likely be better to try to find gpu rigs within the same datacenter, in different datacenters, or across the globe and try to connect them for example, suppose you require something like 100 rtx 4090s, you could try to rent six 12 x rtx 4090 rigs and four 10 x rtx 4090 rigs to get the gpu power you want rigs to get scale step 1 look for machines with same datacenter id machines that belong to a datacenter will have typically have the blue datacenter label on them you can find the datacenter's id next to "datacenter " diff machines same dc in this case, these two machines belong to datacenter 3497 the top instance will run on machine 11877 and the bottom instance will run on machine 13280 within datacenter 3497 step 2 find public ip addresses of machines click on the blue button with ip range to find each instance's public ip address once the blue button says "open" ip port diff machine find public ip address in pop up public ip diff machine step 3 ping ip addresses of other machines to test connectivity use this command to download ping utility inside a jupyter terminal or your own terminal once you have connected to the instance using ssh apt install iputils ping ping the public ip address of another rented machine like in this example ping diff machine if you see a response like this, that means your ping was successful and you were able to connect to the machine if you don't see a response like this, try machines from a different datacenter a ping can fail for reasons such as a firewall or a corporate network blocking the ping packets step 4 \[optional] hit server running at port on another machine in your cluster make sure to expose the ports your servers will run on by editing your template and adding the port you want exposed before you rent your instance expose port 8081 you can find the public ip mapped to the port you exposed in the open ports section in a pop up like this public ip port 8081 in this case, 185 62 108 226 41525 is the public ip mapped to port 8081 in the instance/container you can use curl to hit 185 62 108 226 41525 and the traffic is forwarded to port 8081 inside the instance/container curl public ip port