API (OpenAI / Anthropic compatible)

Streaming API

Real-time token streaming for text generation via OpenAI and Anthropic compatible endpoints.


Overview

The Grid Streaming API provides real token-by-token streaming — not fake chunking. When you request stream: true, tokens arrive as they’re generated by the GPU worker in real time.

Base URL: https://api.aipowergrid.io

EndpointFormatMethod
/v1/chat/completionsOpenAIPOST
/v1/messagesAnthropicPOST
/v1/images/generationsOpenAIPOST
/v1/modelsOpenAIGET
/healthGridGET

Authentication

Use either header format:

apikey: your-api-key

or (OpenAI SDK compatible):

Authorization: Bearer your-api-key

Get your API key at api.aipowergrid.io/register.


Chat Completions (OpenAI Format)

Streaming

curl -N https://api.aipowergrid.io/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grid/llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "What is AI Power Grid?"}],
    "max_tokens": 256,
    "stream": true
  }'

Response (Server-Sent Events):

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234,"model":"grid/llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234,"model":"grid/llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":"AI"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234,"model":"grid/llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":" Power"},"finish_reason":null}]}

data: [DONE]

Non-Streaming

Set "stream": false (or omit it). Returns a single JSON response after generation completes.

curl https://api.aipowergrid.io/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grid/llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 128
  }'

Parameters

ParameterTypeDefaultDescription
modelstringrequiredModel name (see /v1/models)
messagesarrayrequiredChat messages (role + content)
max_tokensint512Maximum tokens to generate
temperaturefloat0.7Randomness (0-2)
top_pfloat0.9Nucleus sampling (0-1)
streamboolfalseEnable SSE streaming
nint1Number of completions

Messages (Anthropic Format)

curl -N https://api.aipowergrid.io/v1/messages \
  -H "x-api-key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grid/llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "What is AI Power Grid?"}],
    "max_tokens": 256,
    "stream": true
  }'

Streaming response uses Anthropic SSE event types: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop.


Image Generation

curl https://api.aipowergrid.io/v1/images/generations \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a cat astronaut floating in space",
    "size": "1024x1024"
  }'

Default model: FLUX.2 [klein] (4 steps, sub-second generation).

ParameterTypeDefaultDescription
promptstringrequiredImage description
modelstringFLUX.2 [klein]Image model
sizestring1024x1024Width x Height
nint1Number of images (1-4)

List Models

curl https://api.aipowergrid.io/v1/models \
  -H "Authorization: Bearer YOUR_KEY"

Returns models from currently connected streaming workers.


SDK Examples

Python (OpenAI SDK)

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.aipowergrid.io/v1",
    api_key="your-key",
)
 
# Streaming
stream = client.chat.completions.create(
    model="grid/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
 
# Non-streaming
response = client.chat.completions.create(
    model="grid/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Python (Anthropic SDK)

from anthropic import Anthropic
 
client = Anthropic(
    base_url="https://api.aipowergrid.io",
    api_key="your-key",
)
 
with client.messages.stream(
    model="grid/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=512,
) as stream:
    for text in stream.text_stream:
        print(text, end="")

JavaScript / TypeScript

import OpenAI from 'openai';
 
const client = new OpenAI({
  baseURL: 'https://api.aipowergrid.io/v1',
  apiKey: 'your-key',
});
 
const stream = await client.chat.completions.create({
  model: 'grid/llama-3.3-70b-versatile',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});
 
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Rate Limits

EndpointLimit
Chat completions30 requests/minute per IP
Messages30 requests/minute per IP
Image generation10 requests/minute per IP

Den (電) Credits

Workers earn den for generating responses. The amount scales with:

  • Output tokens — more tokens = more den
  • Model size — 120B model earns ~60x more than a 3B model
  • Context length — longer prompts cost exponentially more den

Den is used for request priority in the queue. Users with more den get served faster.


Check what’s online

List the models currently served by connected workers:

curl https://api.aipowergrid.io/v1/models
{
  "object": "list",
  "data": [
    {"id": "grid/llama-3.3-70b-versatile", "object": "model"}
  ]
}

An empty data array means no streaming workers are connected for any model right now — requests will return 503 until a worker comes online.


Worker Connection (For GPU Operators)

Workers connect via WebSocket to receive jobs and stream tokens:

WSS api.aipowergrid.io/v1/workers/ws

Enable streaming mode in the worker:

# Environment variable
GRID_STREAMING=true
 
# Or CLI flag
grid-inference-worker --streaming
 
# Or select during quick setup

See LLM Worker for full setup instructions.


Legacy API

The poll-based /api/v2/ endpoints are still available for backward compatibility:

  • POST /api/v2/generate/text/async — Submit text generation
  • GET /api/v2/generate/text/status/{id} — Poll for results
  • POST /api/v2/generate/async — Submit image generation
  • GET /api/v2/generate/status/{id} — Poll for image results

For new integrations, use the streaming /v1/ endpoints.