How P2P Mode Works
This page explains exactly what happens when a user makes a request in P2P mode.
Step-by-Step Flow
1. User Sends Request
User sends a standard OpenAI-compatible request to any API node:
curl https://api-node-1.aipowergrid.io/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grid/llama3.2:3b",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'2. API Node Creates Job
The API node creates a job with a unique ID, random signature, and its peer ID:
job = {
"id": "abc123-def456",
"model": "grid/llama3.2-3b",
"payload": {
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 512,
"temperature": 0.7
},
"signature": "7f3a2b1c...", # Random - used as claim seed
"requester_peer_id": "QmApiNode1...", # Worker streams results here
"timestamp": 1714000000,
"ttl": 60
}3. Job Broadcast via Gossipsub
API Node publishes to: /aipg/1/jobs/grid-llama3.2-3b
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ GOSSIPSUB MESH │
│ │
│ The message propagates to all nodes subscribed to this topic │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
Worker 1 Worker 2 Worker 3
(llama3.2:3b) (llama3.2:3b) (mistral:7b)
RECEIVES RECEIVES (not subscribed)4. Deterministic Claim Resolution
This is the magic. All workers compute the same winner independently:
def should_claim(job, my_peer_id, known_workers):
# Use job's random signature as seed
seed = bytes.fromhex(job.signature[:64])
# Compute my score
my_score = SHA256(job.id + seed + my_peer_id)
# Check if anyone beats me
for worker in known_workers:
their_score = SHA256(job.id + seed + worker)
if their_score < my_score:
return False # They win, I skip
return True # I win!Example with 3 workers:
Job ID: abc123-def456
Signature seed: 0x7f3a2b1c...
┌─────────────────────────────────────────────────────────────────┐
│ Worker 1 computes: │
│ score = SHA256("abc123" + seed + "QmWorker1...") │
│ score = 0x7a3b2c1d... │
│ │
│ Worker 2 computes: │
│ score = SHA256("abc123" + seed + "QmWorker2...") │
│ score = 0x2f1e0d9c... ◄── LOWEST SCORE = WINS │
│ │
│ Worker 3 computes: │
│ score = SHA256("abc123" + seed + "QmWorker3...") │
│ score = 0x9c8b7a6f... │
└─────────────────────────────────────────────────────────────────┘
Result: Worker 2 processes the job. Others skip silently.
No communication needed to reach consensus!5. Winner Claims the Job
Worker 2 broadcasts its claim to prevent any race conditions:
Worker 2 publishes to: /aipg/1/claims
{
"job_id": "abc123-def456",
"worker_id": "QmWorker2...",
"timestamp": 1714000001
}Other workers receive this and mark the job as claimed.
6. Worker Processes the Job
Worker 2 opens a direct stream to the API node and streams results:
┌─────────────────────────────────────────────────────────────────┐
│ Worker 2 │
│ │
│ 1. Open direct libp2p stream to requester │
│ stream = host.new_stream(requester_peer_id, "/aipg/1/result-stream")
│ stream.write("abc123-def456\n") # Send job ID first │
│ │
│ 2. Build request for local backend │
│ POST http://localhost:11434/v1/chat/completions │
│ │
│ 3. Stream tokens from Ollama/vLLM → direct to API node │
│ "Hello" → "!" → " How" → " can" → " I" → " help" → "?" │
│ (sent over direct stream, not gossipsub!) │
│ │
│ 4. Close stream when done │
└─────────────────────────────────────────────────────────────────┘Why direct streams? With gossipsub, 500 tokens = 500 messages through the entire mesh. With direct streams, 500 tokens = 1 stream to 1 peer. Much more efficient.
7. API Node Receives Results
The API node receives tokens on the direct stream:
Worker opens stream to: QmApiNode1...
Protocol: /aipg/1/result-stream
Stream contents (newline-delimited JSON):
abc123-def456 ← job ID
{"type": "token", "token": {"text": "Hello", "index": 1}}
{"type": "token", "token": {"text": "!", "index": 2}}
{"type": "token", "token": {"text": " How", "index": 3}}
...
{"type": "done", "done": {"full_text": "Hello! How can I help?", "token_count": 7}}8. Stream to User
API node converts stream messages to SSE:
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: {"choices":[{"delta":{"content":" How"}}]}
...
data: [DONE]Complete Sequence Diagram
User API Node Gossipsub Worker 1 Worker 2
│ │ │ │ │
│─POST /v1/...─▶│ │ │ │
│ │ │ │ │
│ │──publish job─▶│ │ │
│ │ (includes │ │ │
│ │ peer ID) │ │ │
│ │ │───job msg────▶│ │
│ │ │───job msg───────────────────▶│
│ │ │ │ │
│ │ │ │ compute │ compute
│ │ │ │ score │ score
│ │ │ │ = 0x7a... │ = 0x2f...
│ │ │ │ │
│ │ │ │ SKIP │ WINS
│ │ │ │ │
│ │ │◀──────────────claim──────────│
│ │ │ │ │
│ │ │ │──▶ Ollama
│ │◀═══════════direct stream══════════════════════│◀── tokens
│ │ (job_id + tokens) │ │
│◀──SSE token───│ │ │
│◀──SSE token───│ │ │
│◀──SSE done────│ │ │
│ │ │ │Note: Results flow over a direct libp2p stream (═══), not gossipsub. Only the requesting API node receives the tokens.
Why This Works
| Challenge | Solution |
|---|---|
| Who processes each job? | Deterministic hash - everyone computes same winner |
| What if two workers claim? | First timestamp wins, hash is tiebreaker |
| What if a worker crashes? | Job TTL expires, can be resubmitted |
| How do nodes find each other? | Bootstrap peers + gossipsub mesh discovery |
| How does user get results? | Worker opens direct stream to API node’s peer ID |
| Why not gossipsub for results? | 500 tokens = 500 mesh messages vs 1 direct stream |
Edge Cases
Worker Joins Mid-Job
New worker won’t have the job in their queue. Job continues normally with existing workers.
Worker Crashes Mid-Job
- Direct stream closes unexpectedly
- API node detects closed stream, times out
- Job can be resubmitted (new signature = new claim resolution)
Network Partition
- Different parts of network may see different workers
- Deterministic claim still works within each partition
- May result in duplicate processing (acceptable - idempotent)
All Workers Offline
- Job sits in result topic with no response
- API node times out
- Returns error to user
- Job can be retried when workers come online