P2P Troubleshooting
Common issues and solutions for P2P mode.
Connection Issues
”Failed to connect to bootstrap peer”
Symptoms:
⚠️ Failed to connect to /ip4/1.2.3.4/tcp/4001/p2p/QmPeer...: Connection refusedSolutions:
-
Check the address is correct
# Verify the multiaddr format /ip4/1.2.3.4/tcp/4001/p2p/QmPeerID... -
Check the bootstrap peer is online
nc -zv 1.2.3.4 4001 -
Try a different bootstrap peer
P2P_BOOTSTRAP_PEERS=/ip4/backup.aipowergrid.io/tcp/4001/p2p/QmBackup... -
Check your firewall
sudo ufw status sudo ufw allow 4001/tcp
”P2P node failed to start within timeout”
Symptoms:
RuntimeError: P2P node failed to start within timeoutSolutions:
-
Check port availability
lsof -i :4001 # Kill any existing process using the port -
Try a different port
P2P_LISTEN_PORT=4002 -
Check libp2p installation
pip install --upgrade libp2p trio
Worker not receiving jobs
Symptoms:
⏳ Waiting for jobs...
# (nothing happens)Solutions:
-
Verify subscription topic
Check logs for: 📥 Subscribed to /aipg/1/jobs/grid-llama3.2-3b Make sure GRID_MODEL_NAME matches what API nodes are sending -
Wait for mesh formation
Gossipsub needs ~30 seconds to form a stable mesh. Wait a minute after startup. -
Check you have bootstrap connections
Look for: ✅ Connected to bootstrap peer: QmPeer... -
Submit a test job
curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "grid/llama3.2:3b", "messages": [{"role": "user", "content": "test"}]}'
Backend Issues
”Backend error 500”
Symptoms:
Backend error 500: {"error": "model not found"}Solutions:
-
Check Ollama is running
curl http://localhost:11434/api/version -
Check the model is pulled
ollama list # Should show llama3.2:3b ollama pull llama3.2:3b -
Verify MODEL_NAME matches
# In .env MODEL_NAME=llama3.2:3b # Must match Ollama's name exactly
”Backend error: Connection refused”
Symptoms:
httpx.ConnectError: Connection refusedSolutions:
-
Start Ollama
ollama serve -
Check URL in config
OLLAMA_URL=http://127.0.0.1:11434 # Not https! -
For vLLM, check the port
OPENAI_URL=http://127.0.0.1:8000/v1
Claim Issues
”Not our turn for job”
Symptoms:
Not our turn for job abc123...
# (job goes to another worker)This is normal! With multiple workers, jobs are distributed. Your worker will get its share.
Check your claim rate over time:
✅ abc123 | 127 tokens | total: 1
✅ def456 | 89 tokens | total: 2
✅ ghi789 | 203 tokens | total: 3Worker always skipping jobs
Symptoms: Every job shows “Not our turn”
Solutions:
-
Check known workers list
If you only know about yourself, you should win every job. If you know about other workers with lower scores, you'll skip. -
Restart to get new peer ID
# New peer ID = different claim scores systemctl restart aipg-worker -
Check for peer ID collision
Extremely unlikely, but if two workers have same ID, one always loses.
Memory Issues
Memory growing over time
Symptoms: Worker memory usage increases continuously
Solutions:
-
Claims are cleaned up automatically
Claims older than 2 minutes are pruned every 10 jobs. Check logs for: Cleaned up X old claims -
Restart periodically (temporary fix)
# Add to cron 0 */6 * * * systemctl restart aipg-worker
Network Issues
Behind NAT / No incoming connections
Symptoms:
- Can connect to bootstrap peers
- But no jobs arrive
- Other workers can't reach youSolutions:
-
Port forward
Forward port 4001 (or your P2P_LISTEN_PORT) on your router -
Check with external tool
# From outside your network nc -zv your-public-ip 4001 -
Use relay (if available)
P2P_RELAY_ENABLED=true
Slow job delivery
Symptoms: Jobs take several seconds to arrive
Solutions:
-
Reduce gossipsub heartbeat
Currently hardcoded to 5s. Lower = faster propagation but more bandwidth. -
Add more bootstrap peers
P2P_BOOTSTRAP_PEERS=/ip4/peer1/...,/ip4/peer2/...,/ip4/peer3/...
Debugging
Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("libp2p").setLevel(logging.DEBUG)Check subscription status
Look for these log lines:
📥 Subscribed to /aipg/1/jobs/grid-llama3.2-3b
📥 Subscribed to /aipg/1/claimsCheck peer connections
# Number of connected peers
len(host.get_network().connections)Test gossipsub manually
# Publish a test message
await pubsub.publish("/test/topic", b"hello")Getting Help
- Check the logs first - most issues are visible in output
- Join the AIPG Discord - community support
- Open a GitHub issue - for bugs with reproduction steps
Include in bug reports:
- Python version
- libp2p version:
pip show libp2p - Your .env (redact sensitive values)
- Full error traceback
- Steps to reproduce