Batch node creation to avoid oversized Bolt transactions by berrazuriz1 · Pull Request #318 · blarApp/blarify

berrazuriz1 · 2026-05-25T20:18:21Z

Summary

Mirror the edge-batching pattern on create_nodes: chunk nodeList in Python (batch_size=5000) and send one execute_write per chunk, with progress logging.

Motivation

In production we observed the Neo4j driver hitting:

neo4j.io: <CONNECTION> error: Failed to read from defunct connection IPv4Address(('p-0b67...
neo4j.pool: Unable to retrieve routing information

mid-write while a graph job was persisting nodes for a large repo. Root cause: create_nodes passes the entire nodeList in a single execute_write call. For large repos this is a multi-MB UNWIND payload that the server takes long enough to process that AuraDB / network middleboxes close the underlying connection — the next driver call surfaces as defunct connection, and the routing-table refresh that follows fails too.

create_edges already batches at 10,000 per execute_write. create_nodes did not — same wire-payload problem, just on the nodes side.

Behavior

Before: one Bolt transaction with len(nodeList) items.
After: ceil(len(nodeList) / 5000) transactions, each carrying ≤ 5,000 nodes. The inner apoc.periodic.iterate batch size stays at 1,000 (unchanged), so server-side processing semantics are identical.
Two log lines added, matching the create_edges style:
- Creating N nodes in batches of 5000
- Processing nodes batch X/Y (i/N)

Why 5,000 instead of edges' 10,000: node payloads carry code_text and full attribute maps, so each item is materially bigger than an edge tuple. 5K keeps the per-batch wire payload comparable to the edges path.

Note on dev

There is an open PR #317 (dev → main). dev is ~704 commits behind main (last sync 2025-02-28) and that promotion is unsafe as-is — it would delete ~87K lines including main's test suite. This PR goes directly to main to avoid that path. The earlier PR #316 already landed the equivalent change on dev's legacy file path (blarify/db_managers/neo4j_manager.py), so the fix is captured there too if dev is ever rebuilt from main.

Test plan

poetry run ruff check blarify/repositories/graph_db_manager/neo4j_manager.py — clean.
poetry run pyright …/neo4j_manager.py — no new errors (one pre-existing override-mismatch on line 218 is out of scope).
Manual: run a create_graph against a large repo and watch for Creating N nodes in batches of 5000 / per-batch lines in logs; confirm no defunct connection / routing errors mid-write.

Batch node creation to avoid oversized Bolt transactions

11fd446

berrazuriz1 merged commit ada9bbd into main May 25, 2026
8 of 10 checks passed

berrazuriz1 deleted the feat/batch-create-nodes branch May 25, 2026 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch node creation to avoid oversized Bolt transactions#318

Batch node creation to avoid oversized Bolt transactions#318
berrazuriz1 merged 1 commit into
mainfrom
feat/batch-create-nodes

berrazuriz1 commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

berrazuriz1 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Behavior

Note on dev

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

berrazuriz1 commented May 25, 2026 •

edited

Loading