Data Management

Data Movement

16min
vast ai currently supports several built in mechanisms to copy data to/from instance storage (in addition to all of the standard linux/unix options available inside the instance) for docker based instances instance< >instance and instance< >local copy using the vastai copy cli command instance< >instance copy in the gui instance control panel or vastai copy cli command instance< >cloud copy using the gui instance control panel or vastai cloud copy cli command for vm instances instance< >instance migration through the vastai vm copy cli command or the gui instance control panel these are in addition to standard ssh based copy protocols such as scp or sftp which you can run over ssh, built in jupyter http copy, and any other linux tools you can run inside the instance yourself (rclone, rsync, bittorent, insync https //www insynchq com/headless for linux etc) the 3 built in methods discussed here are unique in that they offer ways to copy data to/from a stopped instance , with some constraints copying data between instances accrues internet bandwidth usage charges (with prices varying across providers), unless the copy is between two instances on the same machine or local network, in which case there is no bandwidth charge instance< >cloud copy (cloud sync) (docker) the cloud sync feature allows you to copy data to/from instance local storage and several cloud storage providers (s3, gdrive, backblaze, etc) even when the instance is stopped using the gui vast currently supports dropbox, amazon s3 and backblaze cloud storage providers first you will need to connect to the cloud provider on the account page https //cloud vast ai/account/ and then use the cloud copy button on the instance to start the copy operation cloud copy see cloud sync https //docs vast ai/instances/cloud sync for more details using the cli you can also access this feature using the vastai cloud copy cli command https //docs vast ai/api/commands instance < > instance copy instance to instance copy allows moving data directly between the local storage of two instances if the two instances are on the same machine or the same local network (same provider and location) then the copy can run at faster local network storage speeds and there is no internet transit cost using the gui you can use the copy buttons to copy data between two instances instances can be stopped/inactive see complete data movement docid\ rasupav7l42xbuldgvspo below click the copy button on the source instance and then on the destination instance to bring up the copy dialogue for docker based instances you will see the following folder dialogue itoicopy pick the folders where you want to copy to/from leave a '/' at the end of the source directory to copy all the files inside into the target directory, vs nesting a copy of the source dir into the target dir ⚠️ warning you should not copy to /root or / as a destination directory, as this can mess up the permissions on your instance ssh folder, breaking future copy operations (as they use ssh authentication) copy modal after clicking the copy button, give it 5 10 seconds to start the status messages will display as the copy operation begins for vm based instances you will see a confirmation dialog instead; the copy will copy your entire source instance to the destination machine the destination instance's disk will be replaced by the contents of the source instance using the cli you can also access this feature using the vastai copy cli command https //docs vast ai/api/commands cli copy command (docker) you can use the cli https //docs vast ai/api/overview and quickstart copy command to copy from/to directories on a remote instance and your local machine, or to copy data between two remote instances the copy command uses rsync and is generally fast and efficient, subject to single link upload/download constraints example /vast copy /workspace 4330147 /workspace that will copy the local /workspace (workspace folder of current user home directory) into the absolute path /workspace on instance 4330147 cli copy command (vms) you can use the cli https //docs vast ai/api/overview and quickstart vm copy command to copy your entire vm from one instance to another the destination vm's disk will be replaced with the contents of the source machine example /vast vm copy 1241241 1241245 this will transfer the contents of 1241241 to 1241245 constraints for vm based instances, the destination instance must be stopped during the transfer ⚠️ warning you should not copy to /root or / as a destination directory, as this can mess up the permissions on your instance ssh folder, breaking future copy operations (as they use ssh authentication) performance if your data is already stored in the cloud (s3, gdrive, etc) then you should naturally use the appropriate linux cli or commands to download and upload data directly, or you could use the cloud sync https //docs vast ai/instances/cloud sync feature this generally will be one the fastest methods for moving large quantities of data, as it can fully saturate a large number of download links if you are using multiple instances with significant data movement requirements you will want to use high bandwidth cloud storage to avoid any single machine bottlenecks if you launched a jupyter notebook instance, you can use its upload feature, but this has a file size limit and can be slow you can also use standard linux tools like scp, ftp, rclone, or rsync to move data for moving code and smaller files scp is fast enough and convenient however, be warned that the default ssh connection uses a proxy and can be slow for large transfers (direct ssh recommended) instance to instance copy is generally as fast as other methods, and can be much faster (and cheaper) for moving data between instances on the same datacenter scp if you launched an ssh instance, you can copy files using scp the proxy ssh connection can be slow (in terms of latency and bandwidth) thus we recommend only using scp over the proxy ssh connection for smaller transfers (less than 1 gb) for larger inbound transfers, using the direct ssh connection is recommended downloading from a cloud data store using wget or curl can have much higher performance the relevant scp command syntax is scp p port local file root\@ipaddr /remotedir the port and ipaddr fields must match those from the ssh command (note the use of p for port instead of p !) the "connect" button on the instance will give you these fields in the form ssh p port root\@ipaddr l 8080\ localhost 8080 for example, if connect gives you this ssh p 7417 root\@52 204 230 7 l 8080\ localhost 8080 you could use scp to upload a local file called "myfile tar gz" to a remote folder called "mydir" like so scp p 7417 myfile tar gz root\@52 204 230 7 /mydir common questions how do you recommend i move data from an existing instance? the cloud sync feature https //docs vast ai/instances/cloud sync will allow you to move data to and from instances easily the main benefit is that you can move data around while the machine is inactive currently, we support google drive, s3, dropbox, and backblaze help, i want to move my data but i forgot what directory it's in! for moving your data, by either using our cloud sync or instance copy features, you will need to define the path from where the data you are transferring is coming from and where it is to be put if you don't remember where the data is you are trying to transfer, you can use our cli execute command https //docs vast ai/api/commands to access your instance when your instance access is limited