Neo4j Setup for PyGrad¶
This guide explains how to set up Neo4j for use with PyGrad's GraphRAG backend.
Using Docker Compose (Recommended)¶
Quick Start¶
-
Start Neo4j:
-
Check status:
-
View logs:
-
Stop Neo4j:
-
Stop and remove data (clean slate):
Access Neo4j¶
- Browser UI: http://localhost:7474
- Bolt Connection: bolt://localhost:7687
- Default Credentials:
- Username:
neo4j - Password:
pleaseletmein
Configuration¶
The docker-compose.yml includes:
- Neo4j 5.26.0: Latest stable version
- APOC Plugin: Extended procedures and functions
- Graph Data Science: Advanced graph algorithms
- Memory Settings:
- Page cache: 512MB
- Heap: 512MB initial, 2GB max
- Persistent Volumes: Data, logs, imports, and plugins
Environment Variables¶
The configuration matches .env_example:
NEO4J_URI="bolt://localhost:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="pleaseletmein"
NEO4J_DATABASE="neo4j"
Manual Installation¶
macOS (Homebrew)¶
Linux (Debian/Ubuntu)¶
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.com stable latest' | sudo tee /etc/apt/sources.list.d/neo4j.list
sudo apt-get update
sudo apt-get install neo4j
sudo systemctl start neo4j
Windows¶
Download and install from: https://neo4j.com/download/
Verifying Connection¶
Test your Neo4j connection:
from neo4j import GraphDatabase
driver = GraphDatabase.driver(
"bolt://localhost:7687",
auth=("neo4j", "pleaseletmein")
)
with driver.session() as session:
result = session.run("RETURN 'Connection successful!' as message")
print(result.single()["message"])
driver.close()
Using with PyGrad¶
-
Configure environment:
-
Install dependencies:
-
Start Neo4j:
-
Add a repository:
-
Query the graph:
Monitoring and Management¶
View Graph in Browser¶
- Open http://localhost:7474
- Connect with credentials
- Run Cypher queries:
// List all repositories
MATCH (n)
WHERE n.repository_id IS NOT NULL
RETURN DISTINCT n.repository_id, labels(n)[0] as type, count(*) as count
// View a specific repository structure
MATCH (n {repository_id: "psf-requests"})
RETURN labels(n)[0] as type, count(*) as count
// Find classes with examples
MATCH (c:Class {repository_id: "psf-requests"})-[:HAS_EXAMPLE]->(e:Example)
RETURN c.api_path, count(e) as example_count
ORDER BY example_count DESC
LIMIT 10
Check Vector Indexes¶
Database Statistics¶
// Node counts by type and repository
MATCH (n)
WHERE n.repository_id IS NOT NULL
RETURN n.repository_id as repository, labels(n)[0] as type, count(*) as count
ORDER BY repository, type
// Relationship counts
MATCH ()-[r]->()
RETURN type(r) as relationship_type, count(*) as count
ORDER BY count DESC
Troubleshooting¶
Container won't start¶
Check logs:
Memory issues¶
Adjust memory settings in docker-compose.yml:
Connection refused¶
- Ensure Neo4j is running:
docker-compose ps - Check port availability:
lsof -i :7687 - Wait for startup:
docker-compose logs -f neo4j(look for "Started")
Clear all data¶
Or via Cypher:
Clean up specific repository¶
Or via Cypher:
Production Considerations¶
For production deployments:
-
Change default password:
-
Enable SSL/TLS: Configure certificates and bolt+s:// protocol
-
Increase memory: Adjust heap and pagecache based on data size
-
Set up backups: Use Neo4j backup tools or volume snapshots
-
Monitor performance: Use Neo4j metrics and monitoring tools
-
Network isolation: Use Docker networks and firewall rules