P2P Troubleshooting

Common issues and solutions for P2P mode.

Connection Issues

”Failed to connect to bootstrap peer”

Symptoms:

⚠️ Failed to connect to /ip4/1.2.3.4/tcp/4001/p2p/QmPeer...: Connection refused

Solutions:

Check the address is correct

# Verify the multiaddr format
/ip4/1.2.3.4/tcp/4001/p2p/QmPeerID...

Check the bootstrap peer is online
```
nc -zv 1.2.3.4 4001
```

Try a different bootstrap peer

P2P_BOOTSTRAP_PEERS=/ip4/backup.aipowergrid.io/tcp/4001/p2p/QmBackup...

Check your firewall
```
sudo ufw status
sudo ufw allow 4001/tcp
```

”P2P node failed to start within timeout”

Symptoms:

RuntimeError: P2P node failed to start within timeout

Solutions:

Check port availability

lsof -i :4001
# Kill any existing process using the port

Try a different port
```
P2P_LISTEN_PORT=4002
```
Check libp2p installation
```
pip install --upgrade libp2p trio
```

Worker not receiving jobs

Symptoms:

⏳ Waiting for jobs...
# (nothing happens)

Solutions:

Verify subscription topic

Check logs for: 📥 Subscribed to /aipg/1/jobs/grid-llama3.2-3b

Make sure GRID_MODEL_NAME matches what API nodes are sending

Wait for mesh formation

Gossipsub needs ~30 seconds to form a stable mesh.
Wait a minute after startup.

Check you have bootstrap connections

Look for: ✅ Connected to bootstrap peer: QmPeer...

Submit a test job

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-oss-120b", "messages": [{"role": "user", "content": "test"}]}'

Backend Issues

”Backend error 500”

Symptoms:

Backend error 500: {"error": "model not found"}

Solutions:

Check Ollama is running
```
curl http://localhost:11434/api/version
```

Check the model is pulled

ollama list
# Should show llama3.2:3b
 
ollama pull llama3.2:3b

Verify MODEL_NAME matches

# In .env
MODEL_NAME=llama3.2:3b  # Must match Ollama's name exactly

”Backend error: Connection refused”

Symptoms:

httpx.ConnectError: Connection refused

Solutions:

Start Ollama
```
ollama serve
```

Check URL in config

OLLAMA_URL=http://127.0.0.1:11434  # Not https!

For vLLM, check the port
```
OPENAI_URL=http://127.0.0.1:8000/v1
```

Claim Issues

”Not our turn for job”

Symptoms:

Not our turn for job abc123...
# (job goes to another worker)

This is normal! With multiple workers, jobs are distributed. Your worker will get its share.

Check your claim rate over time:

✅ abc123 | 127 tokens | total: 1
✅ def456 | 89 tokens | total: 2
✅ ghi789 | 203 tokens | total: 3

Worker always skipping jobs

Symptoms: Every job shows “Not our turn”

Solutions:

Check known workers list

If you only know about yourself, you should win every job.
If you know about other workers with lower scores, you'll skip.

Restart to get new peer ID

# New peer ID = different claim scores
systemctl restart aipg-worker

Check for peer ID collision

Extremely unlikely, but if two workers have same ID, one always loses.

Memory Issues

Memory growing over time

Symptoms: Worker memory usage increases continuously

Solutions:

Claims are cleaned up automatically

Claims older than 2 minutes are pruned every 10 jobs.
Check logs for: Cleaned up X old claims

Restart periodically (temporary fix)

# Add to cron
0 */6 * * * systemctl restart aipg-worker

Network Issues

Behind NAT / No incoming connections

Symptoms:

- Can connect to bootstrap peers
- But no jobs arrive
- Other workers can't reach you

Solutions:

Port forward

Forward port 4001 (or your P2P_LISTEN_PORT) on your router

Check with external tool

# From outside your network
nc -zv your-public-ip 4001

Use relay (if available)
```
P2P_RELAY_ENABLED=true
```

Slow job delivery

Symptoms: Jobs take several seconds to arrive

Solutions:

Reduce gossipsub heartbeat

Currently hardcoded to 5s. Lower = faster propagation but more bandwidth.

Add more bootstrap peers

P2P_BOOTSTRAP_PEERS=/ip4/peer1/...,/ip4/peer2/...,/ip4/peer3/...

Debugging

Enable debug logging

import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("libp2p").setLevel(logging.DEBUG)

Check subscription status

Look for these log lines:

📥 Subscribed to /aipg/1/jobs/grid-llama3.2-3b
📥 Subscribed to /aipg/1/claims

Check peer connections

# Number of connected peers
len(host.get_network().connections)

Test gossipsub manually

# Publish a test message
await pubsub.publish("/test/topic", b"hello")

Getting Help

Check the logs first - most issues are visible in output
Join the AIPG Discord - community support
Open a GitHub issue - for bugs with reproduction steps

Include in bug reports:

Python version
libp2p version: pip show libp2p
Your .env (redact sensitive values)
Full error traceback
Steps to reproduce

Claim Resolution Future: On-Chain