Streaming API
Real-time token streaming for text generation via OpenAI and Anthropic compatible endpoints.
Overview
The Grid Streaming API provides real token-by-token streaming — not fake chunking. When you request stream: true, tokens arrive as they’re generated by the GPU worker in real time.
Base URL: https://api.aipowergrid.io
| Endpoint | Format | Method |
|---|---|---|
/v1/chat/completions | OpenAI | POST |
/v1/messages | Anthropic | POST |
/v1/images/generations | OpenAI | POST |
/v1/models | OpenAI | GET |
/health | Grid | GET |
Authentication
Use either header format:
apikey: your-api-keyor (OpenAI SDK compatible):
Authorization: Bearer your-api-keyGet your API key at api.aipowergrid.io/register.
Chat Completions (OpenAI Format)
Streaming
curl -N https://api.aipowergrid.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grid/llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "What is AI Power Grid?"}],
"max_tokens": 256,
"stream": true
}'Response (Server-Sent Events):
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234,"model":"grid/llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234,"model":"grid/llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":"AI"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234,"model":"grid/llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":" Power"},"finish_reason":null}]}
data: [DONE]Non-Streaming
Set "stream": false (or omit it). Returns a single JSON response after generation completes.
curl https://api.aipowergrid.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grid/llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 128
}'Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | Model name (see /v1/models) |
messages | array | required | Chat messages (role + content) |
max_tokens | int | 512 | Maximum tokens to generate |
temperature | float | 0.7 | Randomness (0-2) |
top_p | float | 0.9 | Nucleus sampling (0-1) |
stream | bool | false | Enable SSE streaming |
n | int | 1 | Number of completions |
Messages (Anthropic Format)
curl -N https://api.aipowergrid.io/v1/messages \
-H "x-api-key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grid/llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "What is AI Power Grid?"}],
"max_tokens": 256,
"stream": true
}'Streaming response uses Anthropic SSE event types: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop.
Image Generation
curl https://api.aipowergrid.io/v1/images/generations \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "a cat astronaut floating in space",
"size": "1024x1024"
}'Default model: FLUX.2 [klein] (4 steps, sub-second generation).
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | required | Image description |
model | string | FLUX.2 [klein] | Image model |
size | string | 1024x1024 | Width x Height |
n | int | 1 | Number of images (1-4) |
List Models
curl https://api.aipowergrid.io/v1/models \
-H "Authorization: Bearer YOUR_KEY"Returns models from currently connected streaming workers.
SDK Examples
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.aipowergrid.io/v1",
api_key="your-key",
)
# Streaming
stream = client.chat.completions.create(
model="grid/llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
# Non-streaming
response = client.chat.completions.create(
model="grid/llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Python (Anthropic SDK)
from anthropic import Anthropic
client = Anthropic(
base_url="https://api.aipowergrid.io",
api_key="your-key",
)
with client.messages.stream(
model="grid/llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=512,
) as stream:
for text in stream.text_stream:
print(text, end="")JavaScript / TypeScript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.aipowergrid.io/v1',
apiKey: 'your-key',
});
const stream = await client.chat.completions.create({
model: 'grid/llama-3.3-70b-versatile',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}Rate Limits
| Endpoint | Limit |
|---|---|
| Chat completions | 30 requests/minute per IP |
| Messages | 30 requests/minute per IP |
| Image generation | 10 requests/minute per IP |
Den (電) Credits
Workers earn den for generating responses. The amount scales with:
- Output tokens — more tokens = more den
- Model size — 120B model earns ~60x more than a 3B model
- Context length — longer prompts cost exponentially more den
Den is used for request priority in the queue. Users with more den get served faster.
Check what’s online
List the models currently served by connected workers:
curl https://api.aipowergrid.io/v1/models{
"object": "list",
"data": [
{"id": "grid/llama-3.3-70b-versatile", "object": "model"}
]
}An empty data array means no streaming workers are connected for any model
right now — requests will return 503 until a worker comes online.
Worker Connection (For GPU Operators)
Workers connect via WebSocket to receive jobs and stream tokens:
WSS api.aipowergrid.io/v1/workers/wsEnable streaming mode in the worker:
# Environment variable
GRID_STREAMING=true
# Or CLI flag
grid-inference-worker --streaming
# Or select during quick setupSee LLM Worker for full setup instructions.
Legacy API
The poll-based /api/v2/ endpoints are still available for backward compatibility:
POST /api/v2/generate/text/async— Submit text generationGET /api/v2/generate/text/status/{id}— Poll for resultsPOST /api/v2/generate/async— Submit image generationGET /api/v2/generate/status/{id}— Poll for image results
For new integrations, use the streaming /v1/ endpoints.