Overview
FAQ
Data Movement
8min
how do i upload/download to/from my instance? you can use the cli copy command to copy from/to directories on a remote instance and your local machine, or to copy data between two remote instances you can use the copy buttons in the gui to copy data between two remote instances the copy command uses rsync and is generally fast and efficient, subject to single link upload/download constraints example /vastai copy /workspace 4330147 /workspace currently, one endpoint of the copy must involve a vast instance with open ports for a remote >local copy or local >remote copy, the remote instance must be on a machine with open ports (although the instance itself does not need open ports), and the remote instance can be stopped/inactive for instances on machines without open ports, copy to/from local is not available, but you can still copy to a 2nd vast instance with open ports for a remote >remote copy (copy between 2 instances), the src can be stopped and does not need open ports, but the dst must be a running instance with open ports it is not sufficient that the instance is on a machine with open ports, the instance itself must have been created with open port mappings if the instance is created with the direct connect option (for jupyter or ssh launch modes), the instance will have at least one open port otherwise, for proxy or entrypoint instance types, you can get open ports using the p option to reserve a port in the instance configuration under run options (and you must also then pick a machine with open ports) if your data is already stored in the cloud (s3, gdrive, etc) then you should naturally use the appropriate linux cli or commands to download and upload data directly this generally will be one the fastest methods for moving large quantities of data, as it can fully saturate a large number of download links if you are using multiple instances with significant data movement requirements you will want to use high bandwidth cloud storage and avoid any single machine bottlenecks if you launched a jupyter notebook instance, you can use its upload feature, but this has a file size limit and can be slow you can also use standard linux tools like scp, ftp, rclone, or rsync to move data for moving code and smaller files scp is fast enough and convenient however, be warned that the default ssh connection uses a proxy and can be slow for large transfers how do i upload/download to/from my instance using scp? if you launched an ssh instance, you can copy files using scp the default ssh connection uses a proxy and thus can be slow (in terms of latency and bandwidth) thus we recommend only using scp over the default ssh connection for smaller transfers (less than 1 gb) for larger inbound transfers, a direct connection is recommended downloading from a cloud data store using wget or curl can have much higher performance the relevant scp command syntax is scp p port local file root\@ipaddr /remotedir the port and ipaddr fields must match those from the ssh command the "connect" button on the instance will give you these fields in the form ssh p port root\@ipaddr l 8080\ localhost 8080 for example, if connect gives you this ssh p 7417 root\@52 204 230 7 l 8080\ localhost 8080 you could use scp to upload a local file called "myfile tar gz" to a remote folder called "mydir" like so scp p 7417 myfile tar gz root i'm getting a connectionreseterror downloading files? this seems to be due to bugs in the urllib3 and or requests libraries used by many python packages we recommend using wget to download large files it is quite robust and recovers from errors gracefully how can i download a kaggle dataset using wget? first, you need to get the raw https link using the chrome browser, on the kaggle website go to the relevant dataset or competition page and start downloading the file you want then cancel the download and press ctrl+j to bring up the chrome downloads page at the top is the most recent download with a name and a link under it right click on the url link and use "copy link address" then you can use wget with that url as follows wget 'url' no check certificate o filename notice the url needs to be wrapped in ' ' single quotes