Graph database implementation for relationship-heavy data models. Use when building social networks, recommendation engines, knowledge graphs, or fraud detection. Covers Neo4j (primary), ArangoDB, Amazon Neptune, Cypher query patterns, and graph data modeling.
apm install @ancoleman/using-graph-databases[](https://apm-p1ls2dz87-atlamors-projects.vercel.app/packages/@ancoleman/using-graph-databases)---
name: using-graph-databases
description: Graph database implementation for relationship-heavy data models. Use when building social networks, recommendation engines, knowledge graphs, or fraud detection. Covers Neo4j (primary), ArangoDB, Amazon Neptune, Cypher query patterns, and graph data modeling.
---
# Graph Databases
## Purpose
This skill guides selection and implementation of graph databases for applications where relationships between entities are first-class citizens. Unlike relational databases that model relationships through foreign keys and joins, graph databases natively represent connections as properties, enabling efficient traversal-heavy queries.
## When to Use This Skill
Use graph databases when:
- **Deep relationship traversals** (4+ hops): "Friends of friends of friends"
- **Variable/evolving relationships**: Schema changes don't break existing queries
- **Path finding**: Shortest route, network analysis, dependency chains
- **Pattern matching**: Fraud detection, recommendation engines, access control
**Do NOT use graph databases when**:
- Fixed schema with shallow joins (2-3 tables) → Use PostgreSQL
- Primarily aggregations/analytics → Use columnar databases
- Key-value lookups only → Use Redis/DynamoDB
## Quick Decision Framework
```
DATA CHARACTERISTICS?
├── Fixed schema, shallow joins (≤3 hops)
│ └─ PostgreSQL (relational)
│
├── Already on PostgreSQL + simple graphs
│ └─ Apache AGE (PostgreSQL extension)
│
├── Deep traversals (4+ hops) + general purpose
│ └─ Neo4j (battle-tested, largest ecosystem)
│
├── Multi-model (documents + graph)
│ └─ ArangoDB
│
├── AWS-native, serverless
│ └─ Amazon Neptune
│
└── Real-time streaming, in-memory
└─ Memgraph
```
## Core Concepts
### Property Graph Model
Graph databases store data as:
- **Nodes** (vertices): Entities with labels and properties
- **Relationships** (edges): Typed connections with properties
- **Properties**: Key-value pairs on nodes and relationships
```
(Person {name: "Alice", age: 28})-[:FRIEND {since: "2020-01-15"}]->(Person {name: "Bob"})
```
### Query Languages
| Language | Databases | Readability | Best For |
|----------|-----------|-------------|----------|
| **Cypher** | Neo4j, Memgraph, AGE | ⭐⭐⭐⭐⭐ SQL-like | General purpose |
| **Gremlin** | Neptune, JanusGraph | ⭐⭐⭐ Functional | Cross-database |
| **AQL** | ArangoDB | ⭐⭐⭐⭐ SQL-like | Multi-model |
| **SPARQL** | Neptune, RDF stores | ⭐⭐⭐ W3C standard | Semantic web |
## Common Cypher Patterns
Reference `references/cypher-patterns.md` for comprehensive examples.
### Pattern 1: Basic Matching
```cypher
// Find all users at a company
MATCH (u:User)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN u.name, u.title
```
### Pattern 2: Variable-Length Paths
```cypher
// Find friends up to 3 degrees away
MATCH (u:User {name: 'Alice'})-[:FRIEND*1..3]->(friend)
WHERE u <> friend
RETURN DISTINCT friend.name
LIMIT 100
```
### Pattern 3: Shortest Path
```cypher
// Find shortest connection between two users
MATCH path = shortestPath(
(a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path, length(path) AS distance
```
### Pattern 4: Recommendations
```cypher
// Collaborative filtering: Products liked by similar users
MATCH (u:User {id: $userId})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar)
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT exists((u)-[:PURCHASED]->(rec))
RETURN rec.name, count(*) AS score
ORDER BY score DESC
LIMIT 10
```
### Pattern 5: Fraud Detection
```cypher
// Detect circular money flows
MATCH path = (a:Account)-[:SENT*3..6]->(a)
WHERE all(r IN relationships(path) WHERE r.amount > 1000)
RETURN path, [r IN relationships(path) | r.amount] AS amounts
```
## Database Selection Guide
### Neo4j (Primary Recommendation)
**Use for**: General-purpose graph applications
**Strengths**:
- Most mature (2007), largest community (2M+ developers)
- 65+ graph algorithms (GDS library): PageRank, Louvain, Dijkstra
- Best tooling: Neo4j Browser, Bloom visualization
- Comprehensive Cypher support
**Installation**:
```bash
# Python driver
pip install neo4j
# TypeScript driver
npm install neo4j-driver
# Rust driver
cargo add neo4rs
```
Reference: `references/neo4j.md`
### ArangoDB
**Use for**: Multi-model applications (documents + graph)
**Strengths**:
- Store documents AND graph in one database
- AQL combines document and graph queries
- Schema flexibility with relationships
Reference: `references/arangodb.md`
### Apache AGE
**Use for**: Adding graph capabilities to existing PostgreSQL
**Strengths**:
- Extend PostgreSQL with graph queries
- No new infrastructure needed
- Query both relational and graph data
Reference: Implementation details in examples/
### Amazon Neptune
**Use for**: AWS-native, serverless deployments
**Strengths**:
- Fully managed, auto-scaling
- Supports Gremlin AND SPARQL
- AWS ecosystem integration
## Graph Data Modeling Patterns
Reference `references/graph-modeling.md` for comprehensive patterns.
### Best Practice 1: Relationships as First-Class Citizens
**Anti-pattern** (storing relationships in node properties):
```cypher
// BAD
(:Person {name: 'Alice', friend_ids: ['b123', 'c456']})
```
**Pattern** (explicit relationships):
```cypher
// GOOD
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'b123'})
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'c456'})
```
### Best Practice 2: Relationship Properties for Metadata
```cypher
// Track interaction details on relationships
(:Person)-[:FRIEND {
since: '2020-01-15',
strength: 0.85,
last_interaction: datetime()
}]->(:Person)
```
### Best Practice 3: Bounded Traversals for Performance
```cypher
// SLOW: Unbounded traversal
MATCH (a)-[:FRIEND*]->(distant)
RETURN distant
// FAST: Bounded depth with index
MATCH (a)-[:FRIEND*1..4]->(distant)
WHERE distant.active = true
RETURN distant
LIMIT 100
```
### Best Practice 4: Avoid Supernodes
**Problem**: Nodes with thousands of relationships slow traversals.
**Solution**: Intermediate aggregation nodes
```cypher
// Instead of: (:User)-[:POSTED]->(:Post) [1M relationships]
// Use time partitioning:
(:User)-[:POSTED_IN]->(:Year {year: 2025})
-[:HAS_MONTH]->(:Month {month: 12})
-[:HAS_POST]->(:Post)
```
## Use Case Examples
### Social Network
Schema and implementation in `examples/social-graph/`
**Key features**:
- Friend recommendations (friends-of-friends)
- Mutual connections
- News feed generation
- Influence metrics
### Knowledge Graph for AI/RAG
Integration example in `examples/knowledge-graph/`
**Key features**:
- Hybrid vector + graph search
- Entity relationship mapping
- Context expansion for LLM prompts
- Semantic relationship traversal
**Integration with Vector Databases**:
```python
# Step 1: Vector search in Qdrant/pgvector
vector_results = qdrant.search(collection="concepts", query_vector=embedding)
# Step 2: Expand with graph relationships
concept_ids = [r.id for r in vector_results]
graph_context = neo4j.run("""
MATCH (c:Concept) WHERE c.id IN $ids
MATCH (c)-[:RELATED_TO|IS_A*1..2]-(related)
RETURN c, related, relationships(path)
""", ids=concept_ids)
```
### Recommendation Engine
Examples in `examples/social-graph/`
**Strategies**:
1. **Collaborative filtering**: "Users who bought X also bought Y"
2. **Content-based**: "Products similar to what you like"
3. **Session-based**: "Recently viewed items"
### Fraud Detection
Pattern detection in examples/
**Detection patterns**:
- Circular money flows
- Shared devices across accounts
- Rapid transaction chains
- Connection pattern anomalies
## Performance Optimization
Reference `references/cypher-patterns.md` for detailed optimization.
### Indexing
```cypher
// Single-property index
CREATE INDEX user_email FOR (u:User) ON (u.email)
// Composite index (Neo4j 5.x+)
CREATE INDEX user_name_location FOR (u:User) ON (u.name, u.location)
// Full-text search
CREATE FULLTEXT INDEX product_search FOR (p:Product) ON EACH [p.name, p.description]
```
### Caching Expensive Aggregations
```cypher
// Materialize friend count as property
MATCH (u:User)-[:FRIEND]->(f)
WITH u, count(f) AS friendCount
SET u.friend_count = friendCount
// Query becomes instant
MATCH (u:User) WHERE u.friend_count > 100
RETURN u.name, u.friend_count
```
### Scaling Strategies
| Scale | Strategy | Implementation |
|-------|----------|----------------|
| **Vertical** | Add RAM/CPU | In-memory caching, larger instances |
| **Horizontal (Read)** | Read replicas | Neo4j Cluster, ArangoDB Cluster |
| **Horizontal (Write)** | Sharding | ArangoDB SmartGraphs, JanusGraph |
| **Caching** | App-level cache | Redis for hot paths |
## Language Integration
### Python (Neo4j)
Complete example in `examples/social-graph/python-neo4j/`
```python
from neo4j import GraphDatabase
class GraphDB:
def __init__(self, uri: str, user: str, password: str):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def find_friends_of_friends(self, user_id: str, max_depth: int = 2):
query = """
MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
WHERE u <> fof
RETURN DISTINCT fof.id, fof.name
LIMIT 100
"""
with self.driver.session() as session:
result = session.run(query, userId=user_id, maxDepth=max_depth)
return [dict(record) for record in result]
# Usage
db = GraphDB("bolt://localhost:7687", "neo4j", "password")
friends = db.find_friends_of_friends("u123", max_depth=3)
```
### TypeScript (Neo4j)
Complete example in `examples/social-graph/typescript-neo4j/`
```typescript
import neo4j, { Driver } from 'neo4j-driver'
class Neo4jService {
private driver: Driver
constructor(uri: string, username: string, password: string) {
this.driver = neo4j.driver(uri, neo4j.auth.basic(username, password))
}
async findFriendsOfFriends(userId: string, maxDepth: number = 2) {
const session = this.driver.session()
try {
const result = await session.run(
`MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
WHERE u <> fof
RETURN DISTINCT fof.id, fof.name
LIMIT 100`,
{ userId, maxDepth }
)
return result.records.map(r => r.toObject())
} finally {
await session.close()
}
}
}
```
### Go (ArangoDB)
```go
import (
"github.com/arangodb/go-driver"
"github.com/arangodb/go-driver/http"
)
func findFriendsOfFriends(db driver.Database, userId string, maxDepth int) ([]User, error) {
query := `
FOR vertex, edge, path IN 1..@maxDepth OUTBOUND @startVertex GRAPH 'socialGraph'
FILTER vertex._id != @startVertex
RETURN DISTINCT vertex
LIMIT 100
`
cursor, err := db.Query(ctx, query, map[string]interface{}{
"startVertex": userId,
"maxDepth": maxDepth,
})
// Handle results...
}
```
## Schema Validation
Use `scripts/validate_graph_schema.py` to check for:
- Unbounded traversals (missing depth limits)
- Missing indexes on frequently queried properties
- Supernodes (nodes with excessive relationships)
- Relationship property consistency
Run validation:
```bash
python scripts/validate_graph_schema.py --database neo4j://localhost:7687
```
## Integration with Other Skills
### With databases-vector (Hybrid Search)
Combine vector similarity with graph context for AI/RAG applications.
See `examples/knowledge-graph/`
### With search-filter
Implement relationship-based queries: "Find all users within 3 degrees of connection"
### With ai-chat
Use knowledge graphs to enrich LLM context with structured relationships.
### With auth-security (ReBAC)
Implement relationship-based access control: "Can user X access resource Y through relation Z?"
## Common Schema Patterns
### Star Schema (Hub and Spokes)
```cypher
(:User)-[:PURCHASED]->(:Product)
(:User)-[:VIEWED]->(:Product)
(:User)-[:RATED]->(:Product)
```
### Hierarchical Schema (Trees)
```cypher
(:CEO)-[:MANAGES]->(:VP)-[:MANAGES]->(:Director)
```
### Temporal Schema (Event Sequences)
```cypher
(:Event {timestamp})-[:NEXT]->(:Event {timestamp})
```
## Getting Started
1. **Choose database**: Use decision framework above
2. **Design schema**: Reference `references/graph-modeling.md`
3. **Implement queries**: Use patterns from `references/cypher-patterns.md`
4. **Validate**: Run `scripts/validate_graph_schema.py`
5. **Optimize**: Add indexes, bound traversals, cache aggregations
## Further Reading
- `references/neo4j.md` - Neo4j setup, drivers, GDS algorithms
- `references/arangodb.md` - ArangoDB multi-model patterns
- `references/cypher-patterns.md` - Comprehensive Cypher query library
- `references/graph-modeling.md` - Data modeling best practices
- `examples/social-graph/` - Complete social network implementation
- `examples/knowledge-graph/` - Hybrid vector + graph for AI/RAG