> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vast.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenAI API-compatible Interface

> Use Vast.ai Serverless endpoints with the standard OpenAI API client by swapping your API key and base URL.

Vast provides an OpenAI API-compatible proxy service that lets you point any application or library that works with the OpenAI API at a Vast Serverless vLLM endpoint instead. If your code already uses the OpenAI Python client (or any OpenAI-compatible HTTP client), you can switch to Vast by changing two values: the **API key** and the **base URL**.

## Prerequisites

* A Vast.ai account with a valid **API key**. You can find your key on the [Account page](https://cloud.vast.ai/account/).
* An active Serverless endpoint running the **vLLM** template. See the [Quickstart](/guides/serverless/quickstart) guide to create one.

## How It Works

Vast runs a lightweight proxy at `openai.vast.ai` that accepts requests in the OpenAI API format and routes them to your Serverless vLLM endpoint. Your client sends a standard OpenAI request, the proxy translates it into a Vast Serverless call, and the response is returned in the OpenAI format your client expects.

This means frameworks and tools built on the OpenAI SDK, such as LangChain, LlamaIndex, or custom chat applications, can use Vast Serverless without any code changes beyond updating credentials.

## Migrating from OpenAI (or Another Provider)

If you already have an application that calls the OpenAI API (or another OpenAI-compatible provider such as Together AI, Anyscale, or a self-hosted vLLM instance), migration requires only two changes:

| Setting  | Before                                        | After                                               |
| -------- | --------------------------------------------- | --------------------------------------------------- |
| API Key  | Your OpenAI / provider key                    | Your [Vast API key](https://cloud.vast.ai/account/) |
| Base URL | `https://api.openai.com/v1` (or provider URL) | `https://openai.vast.ai/<ENDPOINT_NAME>`            |

Replace `<ENDPOINT_NAME>` with the name of your Serverless endpoint. No other code changes are required, the proxy accepts the same request and response schema for the supported endpoints.

<Note>
  The `model` field is required by the OpenAI SDK but is **ignored** by the proxy. The model served is determined entirely by the `MODEL_NAME` environment variable set in your vLLM endpoint configuration. You can pass any string (including an empty string) for this field.
</Note>

<Tabs>
  <Tab title="Python (OpenAI SDK)">
    ```python theme={null}
    from openai import OpenAI

    client = OpenAI(
        api_key="<YOUR_VAST_API_KEY>",
        base_url="https://openai.vast.ai/<ENDPOINT_NAME>",
    )

    response = client.chat.completions.create(
        model="",  # model is determined by your endpoint configuration
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain serverless computing in two sentences."},
        ],
        max_tokens=256,
        temperature=0.7,
    )

    print(response.choices[0].message.content)
    ```
  </Tab>

  <Tab title="JavaScript (OpenAI SDK)">
    ```javascript theme={null}
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: "<YOUR_VAST_API_KEY>",
      baseURL: "https://openai.vast.ai/<ENDPOINT_NAME>",
    });

    const response = await client.chat.completions.create({
      model: "",  // model is determined by your endpoint configuration
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Explain serverless computing in two sentences." },
      ],
      max_tokens: 256,
      temperature: 0.7,
    });

    console.log(response.choices[0].message.content);
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl https://openai.vast.ai/<ENDPOINT_NAME>/v1/chat/completions \
      -H "Authorization: Bearer <YOUR_VAST_API_KEY>" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Explain serverless computing in two sentences."}
        ],
        "max_tokens": 256,
        "temperature": 0.7
      }'
    ```
  </Tab>
</Tabs>

## Supported Endpoints

The proxy supports the following OpenAI-compatible endpoints exposed by vLLM:

| Endpoint               | Description                           |
| ---------------------- | ------------------------------------- |
| `/v1/chat/completions` | Multi-turn conversational completions |
| `/v1/completions`      | Single-prompt text completions        |

Both endpoints support streaming (`"stream": true`).

For detailed request/response schemas and parameters, see the [vLLM template documentation](/guides/serverless/vllm).

## Limitations

<Warning>
  The OpenAI-compatible proxy is designed for **text-in, text-out** workloads only. Review the limitations below before integrating.
</Warning>

### Text only

The proxy supports **text inputs and text outputs** only. The following OpenAI features are **not** supported:

* **Vision / image inputs**, Passing images via `image_url` in message content is not supported.
* **Audio inputs and outputs**, The `/v1/audio` endpoints (speech, transcription, translation) are not available.
* **Image generation**, The `/v1/images` endpoint is not available.
* **Embeddings**, The `/v1/embeddings` endpoint is not available through the proxy.

### vLLM-specific differences from the OpenAI specification

Because the proxy routes to a vLLM backend rather than OpenAI's own service, there are inherent differences between the two:

* **Tokenization**, Token counts may differ from OpenAI models because vLLM uses the tokenizer bundled with the open-source model (e.g., Qwen, Llama). This can affect billing estimates and `max_tokens` behavior.
* **Streaming chunk boundaries**, While the proxy uses the same Server-Sent Events (SSE) format, the exact boundaries of streamed chunks may differ. Some chunks may contain empty strings when chunked prefill is enabled.
* **Tool / function calling**, Tool calling is supported on models that are fine-tuned for it, but behavior may differ from OpenAI's implementation. The `parallel_tool_calls` parameter is not supported. See the [vLLM template documentation](/guides/serverless/vllm) for details.
* **Unsupported parameters**, The following request parameters are accepted but ignored: `user`, `suffix`, and `image_url.detail`.
* **Response fields**, vLLM may return additional fields not present in the OpenAI specification (e.g., `kv_transfer_params`). Standard OpenAI client libraries will safely ignore these.
* **Moderation**, No content moderation layer is applied. OpenAI's `/v1/moderations` endpoint is not available.