Getting started with Telnyx Inference API

Introduction

Welcome to the Telnyx Inference API! This guide will teach you the basics of chatting with open-source language models running on Telnyx GPUs.

Prerequisites

Sign up for a free Telnyx account
Create an API Key
[Optional] OpenAI SDK
- Our inference API is OpenAI-compatible
- Try using one of their SDKs (pip install openai or npm install openai)

Python Example

Let's complete your first chat. Here's some simple Python to interact with a language model.

Note
Make sure you have set the TELNYX_API_KEY environment variable

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("TELNYX_API_KEY"),
  base_url="https://api.telnyx.com/v2/ai",
)

chat_completion = client.chat.completions.create(
  messages=[
    {
        "role": "user",
        "content": "Tell me about Telnyx"
    }
  ],
  model="meta-llama/Meta-Llama-3.1-8B-Instruct",
  stream=True
)

for chunk in chat_completion:
  if chunk.choices[0].delta.content:
    print(chunk.choices[0].delta.content, end="", flush=True)

Core Concepts

Messages

These refer to the history of messages in a chat.

Roles

Every message has a role: system, user, assistant, or tool.

System messages are sent once at the start of a chat, instructing the model how to behave for the duration of the chat. This is a good way to give the model a goal or a set of rules to follow.
User messages refer to what the end-user has input
Assistant messages refer to what the model has output
Tool messages refer to the results of any tool calls. Tools are often referred to as function calls. See our function calling tutorial for more information.

Models

In the context of chat completions, we are talking about large language models (LLMs). Your choice of LLM will affect the quality, speed, and price of your chat completions.

If you are optimizing for price, try meta-llama/Meta-Llama-3.1-8B-Instruct
For quality, try meta-llama/Meta-Llama-3.1-70B-Instruct
Or explore our LLM Library

Streaming

For real-time interactions, you will want the ability to stream partial responses back to a client as they are completed. To achieve this, we follow the same Server-sent events standard as OpenAI.

Not sure how to get started?

I want to...	Relevant Tutorial
Build a voice assistant	No-Code Voice Assistant
Enforce structured JSON output	JSON Mode and Beyond
Let a language model invoke my custom code	Function Calling Function Calling (Streaming + Parallel Calls)
Send audio to a language model	Audio Language Models
Send images to a language model	Vision Language Models
Give a language model access to relevant documents	Embeddings
Identify themes in my data	Clusters
Teach a language model specific and complex tasks	Fine-tuning

Additional References

Dive into our tutorials
Explore our our full API reference.
Review our OpenAI Compatibility Matrix
Check out our pricing page

Feedback

Have questions or need help troubleshooting? Our support team is here to assist you. Join our Slack community to connect with other developers and the Telnyx team.