Create a chat completion
POST/ai/chat/completions
Chat with a language model. This endpoint is consistent with the OpenAI Chat Completions API and may be used with the OpenAI JS or Python SDK.
Request
- application/json
Body
required
Array [
]
Array [
]
messages
object[]
required
A list of the previous chat messages for context.
content
object
required
oneOf
Possible values: [system
, user
, assistant
, tool
]
A list of the previous chat messages for context.
Default value: meta-llama/Meta-Llama-3.1-8B-Instruct
The language model to chat with. If you are optimizing for speed + price, try meta-llama/Meta-Llama-3.1-8B-Instruct
. For quality, try meta-llama/Meta-Llama-3.1-70B-Instruct
. Or explore our LLM Library.
Whether or not to stream data-only server-sent events as they become available.
Default value: 0.1
Adjusts the "creativity" of the model. Lower values make the model more deterministic and repetitive, while higher values make the model more random and creative.
Maximum number of completion tokens the model should generate.
tools
object[]
The function
tool type follows the same schema as the OpenAI Chat Completions API. The retrieval
tool type is unique to Telnyx. You may pass a list of embedded storage buckets for retrieval-augmented generation.
oneOf
Possible values: [none
, auto
, required
]
response_format
object
Use this is you want to guarantee a JSON output without defining a schema. For control over the schema, use guided_json
.
Possible values: [text
, json_object
]
Must be a valid JSON schema. If specified, the output will follow the JSON schema.
If specified, the output will follow the regex pattern.
If specified, the output will be exactly one of the choices.
This is an alternative to top_p
that many prefer. Must be in [0, 1].
This will return multiple choices for you instead of a single chat completion.
Setting this to true
will allow the model to explore more completion options. This is not supported by OpenAI.
This is used with use_beam_search
to determine how many candidate beams to explore.
Default value: 1
This is used with use_beam_search
to prefer shorter or longer completions.
This is used with use_beam_search
. If true
, generation stops as soon as there are best_of
complete candidates; if false
, a heuristic is applied and the generation stops when is it very unlikely to find better candidates.
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content
of message
.
This is used with logprobs
. An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
Default value: 0
Higher values will penalize the model from repeating the same output tokens.
Default value: 0
Higher values will penalize the model from repeating the same output tokens.
An alternative or complement to temperature
. This adjusts how many of the top possibilities to consider.
If you are using OpenAI models using our API, this is how you pass along your OpenAI API key.
Responses
200: Successful Response
- application/json
422: Validation Error
- application/json
Request samples
curl -L 'https://api.telnyx.com/v2/ai/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"messages": [
{
"role": "system",
"content": "You are a friendly chatbot."
},
{
"role": "user",
"content": "Hello, world!"
}
]
}'
Response samples
{
"detail": [
{
"loc": [
"string",
0
],
"msg": "string",
"type": "string"
}
]
}