| Category | Basis | Notes |
|---|---|---|
| Text generation | Per 1M tokens (input + output) | Input and output priced separately |
| Audio transcription | Per second of audio | Varies by model |
| Text-to-speech | Per 1M characters | Varies by voice/model |
| Embeddings | Per 1M tokens | Single rate |