Use Cases
AI Text Generation
Huggingface TGI with LLama3
8min
this is a guide on how to setup and expose an api for llama3 text generation for this guide the model will be unquantized and using the 8gb version 1\) choose the huggingface llama3 tgi api template from the recommended section login to your vast account on the console https //cloud vast ai select the huggingface llama3 tgi api https //cloud vast ai/?template id=906891f677fb36f21662a92e6092b5fc template by clicking the link provider for this template we will be using the meta llama/meta llama 3 8b instruct model, and the tgi 2 0 4 from huggingface templates encapsulate all the information required to run an application with the autoscaler, including machine parameters, docker image, and environment variables for this template, the only requirement is that you have your own huggingface access token you will also need to apply to have access to llama3 on huggingface in order to access this gated repository the template comes with some filters that are minimum requirements for tgi to run effectively this includes but is not limited to a disk space requirement of 100gb, and a gpu ram requirement of at least 16gb after selecting the template your screen should look like this select 2\) modifying the template ⚠️ warning you will fail to run this template if you do not supply your huggingface access token and you do not have access to the gated repository from huggingface for meta's llama3 once you have selected the template, you will need to then add in your huggingface token and click the 'select & save' button you can add your huggingface token with the rest of the docker run options edithf this is the only modification you will need to make on this template you can then press 'select & save' to get ready to launch your instance 3\) rent a gpu once you have selected the template, you can then choose to rent a gpu of your choice from either the search page or the cli/api for someone just getting started i recommend either an nvidia rtx 4090, or an a5000 rent 4\) monitor your instance once you rent a gpu your instance will being spinning up on the instances page you know the api will be ready when your instance looks like this llama3tgiinstances once your instance is ready you will need to find where your api is exposed go to the ip & config by pressing the blue button on the top of the instance card you can see the networking configuration here llama3ip after opening the ip & port config you should see a forwarded port from 5001, this is where your api resides to hit tgi you can use the '/generate' endpoint on that port here is an example llama3tgipostman 5\) congratulations! you now have a running instance with an api that is using tgi loaded up with llama3 8b! serverless/autoscaler guide as you use tgi you may want to scale up to higher loads we currently offer a serverless version of the huggingface tgi via a template built to run with the autoscaler see getting started with autoscaler https //docs vast ai/serverless/getting started