ctxgraph

Typed knowledge graph for AI agents. Single Rust binary. Single SQLite file. One LLM call per write. Zero LLM calls for 90% of reads.

brew install rohansx/tap/ctxgraph
ctxgraph init
ctxgraph log "Migrated auth from Redis sessions to JWT. Chose JWT for stateless scaling."
ctxgraph query "why did we move away from Redis?"

Working spec: docs/CLARITY.md — product, decisions, the 5 pieces to build, launch pitch. Architecture: docs/ARCHITECTURE.md — as-built (§1-4) + v0.3 target (§5-14). Roadmap: docs/ROADMAP.md — 5 pieces + 12-week schedule + this-weekend todo. Benchmarks: docs/BENCHMARKS.md — measured F1 numbers + hostile-reader audit.

Benchmarks (measured, third-party + reproducible)

Correction (2026-06): an earlier headline here claimed "+0.227 combined F1 over Graphiti." That was a measurement bug — an un-scoped Graphiti relation query (LIMIT 50, no group_id) scored Graphiti against the whole accumulating graph. Fixed. The honest picture: extraction quality is at parity with Graphiti and with cloud frontier models; the win is architectural — one LLM call, fully local, $0. Full detail + audit in docs/BENCHMARKS.md.

Third-party accuracy — CoNLL04 (standard RE dataset neither tool authored), strict directional + typed relation scorer, 80 test sentences, single call. Reproduce: scripts/conll04_bench.py.

Model (single call)	entity F1	relation F1 (directional + typed)
anthropic/claude-haiku-4.5	0.864	0.604
z-ai/glm-5.2	0.867	0.589
google/gemini-2.5-flash-lite	0.846	0.560
minimax/minimax-m3	0.840	0.541
deepseek/deepseek-v4-flash	0.844	0.525
deepseek/deepseek-v3.2	0.861	0.514

vs Graphiti — same model (gemini-2.5-flash-lite), same fixture, same scorer (after fixing the bug): combined F1 0.638 (ctxgraph) vs 0.636 (Graphiti) — a statistical tie on extraction. The real, measured advantage is efficiency:

	LLM calls / episode (measured)	local Gemma-4-12B latency
ctxgraph	1.0	~33 s/ep
Graphiti	2.55	~84 s/ep

→ equivalent extraction quality at ~2.6× fewer LLM calls, fully local, $0 marginal cost. That — not an accuracy edge — is the moat.

How it works

                 ┌──────────────────────────────────────┐
                 │       WRITE PATH (one LLM call)       │
                 │  Tier 1: GLiNER2 ONNX (CPU, ~30ms)    │
                 │  Tier 2: NuExtract 2.0 (local Ollama) │
                 │  Tier 3: Cloud (only if needed)       │
                 │    Mode B default: Cerebras free       │
                 │    Paid: DeepInfra Gemma-4-26B-A4B    │
                 └──────────────────────────────────────┘
                                  │
                                  ▼
                 ┌──────────────────────────────────────┐
                 │       SQLite + FTS5 + sqlite-vec      │
                 │       bi-temporal edges, RRF search   │
                 └──────────────────────────────────────┘
                                  ▲
                                  │
                 ┌──────────────────────────────────────┐
                 │   READ PATH (zero LLM in 90% cases)   │
                 │  Simple (90%):                        │
                 │    verb → typed relation via cosine   │
                 │    embedding match (~30 LOC)          │
                 │    then deterministic SQL             │
                 │  Complex (10%):                       │
                 │    local Qwen3-1.5B parses NL →       │
                 │    graph op, then SQL                 │
                 │  NO cloud LLM ever in read path       │
                 └──────────────────────────────────────┘

Two architectural bets:

One LLM call per write. Tiered escalation: local ONNX handles ~70% of episodes, local LLM another 25%, cloud only when both fail. Compare to Graphiti's 6 calls per episode.
Zero LLM calls in the read path for 90% of queries. The universal schema's 10 typed relations are a closed set — your user verb cosine-matches to one of them, then SQL runs deterministically. Only multi-hop / time-filter / conjunction queries (~10%) call a tiny local Qwen3-1.5B. No cloud LLM ever sees a read.

This is the bit competitors can't match. Graphiti, Mem0, Letta all need an LLM at read time because their relation types are free-form text the SQL engine can't reason about.

The universal schema (v0.3 target)

9 entity types, 10 relations, hardcoded. Users never write a schema.

Entity types	Relation types
Person, Place, Organization, Concept, Artifact, Event, Time, Idea, Fact	mentions, located_at, related_to, caused, preceded, references, owned_by, part_of, depends_on, participated_in

Broad enough to handle personal wikis, work notes, research, recipes, code, journal entries — anything text-shaped. Edge-case domains (recipes need "Ingredient", scientific datasets need "Measurement") get handled by an automatic schema-improvement loop: the LLM logs suggestions to a side-table; a nightly cron promotes types that show up across ≥ 5 distinct episodes with cosine-similarity < 0.85 to any existing type. Users see this as a one-line notice the next time they invoke the CLI.

Full schema rationale → docs/CLARITY.md § 3

Three modes

You pick one at ctxgraph init. All three keep reads local.

Mode	Writes	Cost / 1k eps	Best for
`local-only`	GLiNER2 → NuExtract 2.0 → Qwen3-8B (all local)	$0	Privacy / offline / sensitive data
`cloud-fallback` (default)	Local first; Cerebras free tier when local is stuck	$0 in practice*	Most users
`cloud-quality`	Skip local; every episode goes to Cerebras Qwen3-32B or DeepInfra Gemma-4-26B-A4B	$0–$0.11	Long-form text, research papers

* Cerebras free tier = 1M tokens/day, 30 RPM. Enough for ~1 250 episodes/day. DeepInfra Gemma-4-26B-A4B ($0.07 in / $0.34 out, ~$0.11/1k eps) is the paid fallback when Cerebras rate-limits.

allow_cloud = false in ~/.ctxgraph/config.toml forces Mode A regardless of mode — the privacy override.

Competitive landscape

	ctxgraph	Graphiti / Zep	Mem0	Letta	Cognee
Distribution	single Rust binary	Python + Neo4j + Docker	Python SDK	Python	Python + Neo4j
Local-only mode	yes	no	no	yes (Apache 2.0)	no
LLM calls per write	1	6	N	varies	varies
LLM in read path	no (90% of queries)	yes	yes	yes	yes
Schema-typed extraction	yes (universal 9/10)	free-form verbs	free-form	typed but manual	typed but manual
Bi-temporal edges	yes	yes	no	no	no
Verified $/1k eps (Gemma 4 26B)	$0.11	~$0.66 (6×)	N/A	N/A	N/A
Apples-to-apples combined F1 vs ctxgraph (same model)	0.687	0.460	not measured	not measured	not measured
Stars (rough, May 2026)	early	~20K	~50K	~30K	~15K

More competitor analysis → docs/ROADMAP.md § "Competitive landscape"

What's in the box today (v0.8.0)

Component	Status	Lines
`ctxgraph-core` — SQLite + FTS5 + bi-temporal graph	shipped	~2 000
`ctxgraph-extract` — tiered extraction (current: GLiNER + GLiREL + LLM gate)	shipped	~8 500
`ctxgraph-embed` — fastembed wrapper, all-MiniLM-L6-v2 (384-dim)	shipped	~70
`ctxgraph-cli` — init, log, query, entities, stats, models, mcp start	shipped	~600
`ctxgraph-mcp` — MCP server, 6 tools	shipped	~870

v0.3 is the next launch — see docs/ROADMAP.md. It swaps GLiNER + GLiREL for GLiNER2 (single forward pass), adopts the universal schema, adds the no-LLM read path, defaults to Cerebras free tier, and re-runs the 29-episode benchmark to confirm the headline lands at ≥ 0.745 combined F1 with a fully local stack.

Install

# macOS + Linux (prebuilt binaries via Homebrew)
brew install rohansx/tap/ctxgraph

# or from source (Rust 1.85+)
cargo install ctxgraph-cli

Quick start

ctxgraph init
ctxgraph log "Alice chose PostgreSQL over MySQL for the new billing service."
ctxgraph log "PostgreSQL replaced MySQL in prod on 2026-04-12."
ctxgraph query "what did Alice choose?"
ctxgraph query "what was replaced?"

MCP server (Claude Code / Cursor / Cline)

{
  "mcpServers": {
    "ctxgraph": { "command": "ctxgraph-mcp" }
  }
}

Tool	Description
`ctxgraph_add_episode`	Record an event or decision
`ctxgraph_search`	Fused FTS5 + semantic + graph search
`ctxgraph_traverse`	Walk the graph from an entity
`ctxgraph_find_precedents`	Find similar past events
`ctxgraph_list_entities`	List entities with filters
`ctxgraph_export_graph`	Export entities and edges

Rust SDK

use ctxgraph::{Graph, Episode};

let mut graph = Graph::init(".ctxgraph")?;
graph.add_episode(
    Episode::builder("Chose Postgres over Mongo for the billing rewrite").build()
)?;
let results = graph.search("why Postgres?", 10)?;

Project structure

crates/
├── ctxgraph-core/      types, storage, query, temporal
├── ctxgraph-extract/   tiered extraction (ONNX + LLM)
├── ctxgraph-embed/     local embeddings (384-dim)
├── ctxgraph-cli/       CLI binary
└── ctxgraph-mcp/       MCP server

Reproducing the benchmarks

export OPENROUTER_API_KEY=sk-or-...

# 1) Third-party accuracy on CoNLL04 (auto-fetches the dataset; strict directional+typed scorer)
python scripts/conll04_bench.py --model google/gemini-2.5-flash-lite --out conll04.json --limit 80
#   …or against a LOCAL model via ollama (no API cost):
python scripts/conll04_bench.py --model 'hf.co/<your>/gemma-gguf:Q4_K_M' \
  --base-url http://localhost:11434/v1/chat/completions --out conll04_local.json --limit 40

# 2) Cross-domain model bake-off (ctxgraph single-call prompt)
python scripts/openrouter_bench.py --model deepseek/deepseek-v3.2 --out bench.json \
  --skip-tech --cd-fixture crates/ctxgraph-extract/tests/fixtures/cross_domain_v2.json

# 3) ctxgraph-vs-Graphiti, same model, same scorer (needs Neo4j + graphiti venv)
docker run -d --name neo4j-bench -p 7687:7687 -e NEO4J_AUTH=neo4j/benchpass123 neo4j:5.26
python3 -m venv .venv-graphiti && .venv-graphiti/bin/pip install graphiti-core neo4j fastembed
.venv-graphiti/bin/python scripts/graphiti_openrouter_bench.py \
  --model google/gemini-2.5-flash-lite --out graphiti.json

# 4) Cost/efficiency: measure Graphiti's ACTUAL LLM calls/episode vs ctxgraph's 1
.venv-graphiti/bin/python scripts/cost_efficiency_bench.py --model google/gemini-2.5-flash-lite

Each model run costs ~$0.005–0.02 on OpenRouter; the CoNLL04 dataset is fetched from HuggingFace at run time (no third-party data committed to the repo).

Contributing

See CONTRIBUTING.md. For design discussions, docs/CLARITY.md is the working doc — propose changes against it.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.claude		.claude
.github		.github
crates		crates
docs		docs
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
ctxgraph.toml.example		ctxgraph.toml.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ctxgraph

Benchmarks (measured, third-party + reproducible)

How it works

The universal schema (v0.3 target)

Three modes

Competitive landscape

What's in the box today (v0.8.0)

Install

Quick start

MCP server (Claude Code / Cursor / Cline)

Rust SDK

Project structure

Reproducing the benchmarks

Contributing

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ctxgraph

Benchmarks (measured, third-party + reproducible)

How it works

The universal schema (v0.3 target)

Three modes

Competitive landscape

What's in the box today (v0.8.0)

Install

Quick start

MCP server (Claude Code / Cursor / Cline)

Rust SDK

Project structure

Reproducing the benchmarks

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages