Skip to main content

Create a chat completion

POST 

/ai/chat/completions

Chat with a language model. This endpoint is consistent with the OpenAI Chat Completions API and may be used with the OpenAI JS or Python SDK.

Request

Body

required

    messages

    object[]

    required

    A list of the previous chat messages for context.

  • Array [

  • content stringrequired
    role stringrequired

    Possible values: [system, user, assistant, tool]

  • ]

  • model string

    Default value: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

    The language model to chat with. If you are optimizing for speed, try mistralai/Mistral-7B-Instruct-v0.1. For quality, try NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

    stream boolean

    Default value: false

    Whether or not to stream data-only server-sent events as they become available.

    max_tokens integer

    Maximum number of completion tokens the model should generate.

    temperature number

    Adjusts the "creativity" of the model. Lower values make the model more deterministic and repetitive, while higher values make the model more random and creative.

    min_p number

    This is an alternative to temperature that many prefer. Must be in [0, 1].

    n number

    This will return multiple choices for you instead of a single chat completion.

    tools

    object[]

    The retrieval tool type is unique to Telnyx. You may pass a list of embedded storage buckets for retrieval-augmented generation.

  • Array [

  • anyOf

    type stringrequired

    Possible values: [retrieval]

    retrieval

    object

    required

    bucket_ids string[]required
  • ]

  • tool_choice string

    Possible values: [none, auto]

    use_beam_search boolean

    Default value: false

    Setting this to true will allow the model to explore more completion options. This is not supported by OpenAI.

    best_of integer

    This is used with use_beam_search to determine how many candidate beams to explore.

    length_penalty number

    Default value: 1

    This is used with use_beam_search to prefer shorter or longer completions.

    early_stopping boolean

    Default value: false

    This is used with use_beam_search. If true, generation stops as soon as there are best_of complete candidates; if false, a heuristic is applied and the generation stops when is it very unlikely to find better candidates.

    frequency_penalty number

    Higher values will penalize the model from repeating the same output tokens.

    presence_penalty number

    Higher values will penalize the model from repeating the same output tokens.

    top_p number

    An alternative to temperature.

    openai_api_key string

    If you are using OpenAI models using our API, this is how you pass along your OpenAI API key.

Responses

200: Successful Response

Schema

    any

422: Validation Error

Schema

    detail

    object[]

  • Array [

  • loc

    object[]

    required

  • Array [

  • anyOf

    string

  • ]

  • msg Messagerequired
    type Error Typerequired
  • ]

Loading...