Create a chat completion

POST /ai/chat/completions

Chat with a language model. This endpoint is consistent with the OpenAI Chat Completions API and may be used with the OpenAI JS or Python SDK.

Request

application/json

Body

required

messages

object[]

required

A list of the previous chat messages for context.

Array [

content

object

required

oneOf

string

Array [

type stringrequired

Possible values: [text, image_url]

text string

image_url string

]

role stringrequired

Possible values: [system, user, assistant, tool]

]

A list of the previous chat messages for context.

model string

Default value: meta-llama/Meta-Llama-3.1-8B-Instruct

The language model to chat with. If you are optimizing for speed + price, try meta-llama/Meta-Llama-3.1-8B-Instruct. For quality, try meta-llama/Meta-Llama-3.1-70B-Instruct. Or explore our LLM Library.

api_key_ref string

If you are using an external inference provider like xAI or OpenAI, this field allows you to pass along a reference to your API key. After creating an integration secret for you API key, pass the secret's identifier in this field.

stream boolean

Whether or not to stream data-only server-sent events as they become available.

temperature number

Default value: 0.1

Adjusts the "creativity" of the model. Lower values make the model more deterministic and repetitive, while higher values make the model more random and creative.

max_tokens integer

Maximum number of completion tokens the model should generate.

tools

object[]

The function tool type follows the same schema as the OpenAI Chat Completions API. The retrieval tool type is unique to Telnyx. You may pass a list of embedded storage buckets for retrieval-augmented generation.

Array [

oneOf

type stringrequired

Possible values: [function]

function

object

required

name stringrequired

description string

parameters object

type stringrequired

Possible values: [retrieval]

retrieval

object

required

bucket_ids string[]required

List of embedded storage buckets to use for retrieval-augmented generation.

max_num_results integer

The maximum number of results to retrieve as context for the language model.

]

tool_choice string

Possible values: [none, auto, required]

response_format

object

Use this is you want to guarantee a JSON output without defining a schema. For control over the schema, use guided_json.

type stringrequired

Possible values: [text, json_object]

guided_json object

Must be a valid JSON schema. If specified, the output will follow the JSON schema.

guided_regex string

If specified, the output will follow the regex pattern.

guided_choice string[]

If specified, the output will be exactly one of the choices.

min_p number

This is an alternative to top_p that many prefer. Must be in [0, 1].

n number

This will return multiple choices for you instead of a single chat completion.

use_beam_search boolean

Setting this to true will allow the model to explore more completion options. This is not supported by OpenAI.

best_of integer

This is used with use_beam_search to determine how many candidate beams to explore.

length_penalty number

Default value: 1

This is used with use_beam_search to prefer shorter or longer completions.

early_stopping boolean

This is used with use_beam_search. If true, generation stops as soon as there are best_of complete candidates; if false, a heuristic is applied and the generation stops when is it very unlikely to find better candidates.

logprobs boolean

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

top_logprobs integer

This is used with logprobs. An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.

frequency_penalty number

Default value: 0

Higher values will penalize the model from repeating the same output tokens.

presence_penalty number

Default value: 0

Higher values will penalize the model from repeating the same output tokens.

top_p number

An alternative or complement to temperature. This adjusts how many of the top possibilities to consider.

Responses

200: Successful Response

application/json

Schema

any

422: Validation Error

application/json

Schema

detail

object[]

Array [

loc

object[]

required

Array [

anyOf

string

]

msg Message (string)required

type Error Type (string)required

]

{
  "detail": [
    {
      "loc": [
        "string",
        0
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}

Request samples

curl -L 'https://api.telnyx.com/v2/ai/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a friendly chatbot."
    },
    {
      "role": "user",
      "content": "Hello, world!"
    }
  ]
}'

Response samples

{
  "detail": [
    {
      "loc": [
        "string",
        0
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}

Create a chat completion

Request ​

Body

Responses ​

Request

Responses