Typed knowledge graph for AI agents. Single Rust binary. Single SQLite file. One LLM call per write. Zero LLM calls for 90% of reads.
brew install rohansx/tap/ctxgraph
ctxgraph init
ctxgraph log "Migrated auth from Redis sessions to JWT. Chose JWT for stateless scaling."
ctxgraph query "why did we move away from Redis?"Working spec:
docs/CLARITY.md— product, decisions, the 5 pieces to build, launch pitch. Architecture:docs/ARCHITECTURE.md— as-built (§1-4) + v0.3 target (§5-14). Roadmap:docs/ROADMAP.md— 5 pieces + 12-week schedule + this-weekend todo. Benchmarks:docs/BENCHMARKS.md— measured F1 numbers + hostile-reader audit.
Correction (2026-06): an earlier headline here claimed "+0.227 combined F1 over Graphiti." That was a measurement bug — an un-scoped Graphiti relation query (
LIMIT 50, nogroup_id) scored Graphiti against the whole accumulating graph. Fixed. The honest picture: extraction quality is at parity with Graphiti and with cloud frontier models; the win is architectural — one LLM call, fully local, $0. Full detail + audit in docs/BENCHMARKS.md.
Third-party accuracy — CoNLL04 (standard RE dataset neither tool authored), strict directional + typed relation scorer, 80 test sentences, single call. Reproduce: scripts/conll04_bench.py.
| Model (single call) | entity F1 | relation F1 (directional + typed) |
|---|---|---|
| anthropic/claude-haiku-4.5 | 0.864 | 0.604 |
| z-ai/glm-5.2 | 0.867 | 0.589 |
| google/gemini-2.5-flash-lite | 0.846 | 0.560 |
| minimax/minimax-m3 | 0.840 | 0.541 |
| deepseek/deepseek-v4-flash | 0.844 | 0.525 |
| deepseek/deepseek-v3.2 | 0.861 | 0.514 |
vs Graphiti — same model (gemini-2.5-flash-lite), same fixture, same scorer (after fixing the bug): combined F1 0.638 (ctxgraph) vs 0.636 (Graphiti) — a statistical tie on extraction. The real, measured advantage is efficiency:
| LLM calls / episode (measured) | local Gemma-4-12B latency | |
|---|---|---|
| ctxgraph | 1.0 | ~33 s/ep |
| Graphiti | 2.55 | ~84 s/ep |
→ equivalent extraction quality at ~2.6× fewer LLM calls, fully local, $0 marginal cost. That — not an accuracy edge — is the moat.
┌──────────────────────────────────────┐
│ WRITE PATH (one LLM call) │
│ Tier 1: GLiNER2 ONNX (CPU, ~30ms) │
│ Tier 2: NuExtract 2.0 (local Ollama) │
│ Tier 3: Cloud (only if needed) │
│ Mode B default: Cerebras free │
│ Paid: DeepInfra Gemma-4-26B-A4B │
└──────────────────────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ SQLite + FTS5 + sqlite-vec │
│ bi-temporal edges, RRF search │
└──────────────────────────────────────┘
▲
│
┌──────────────────────────────────────┐
│ READ PATH (zero LLM in 90% cases) │
│ Simple (90%): │
│ verb → typed relation via cosine │
│ embedding match (~30 LOC) │
│ then deterministic SQL │
│ Complex (10%): │
│ local Qwen3-1.5B parses NL → │
│ graph op, then SQL │
│ NO cloud LLM ever in read path │
└──────────────────────────────────────┘
Two architectural bets:
- One LLM call per write. Tiered escalation: local ONNX handles ~70% of episodes, local LLM another 25%, cloud only when both fail. Compare to Graphiti's 6 calls per episode.
- Zero LLM calls in the read path for 90% of queries. The universal schema's 10 typed relations are a closed set — your user verb cosine-matches to one of them, then SQL runs deterministically. Only multi-hop / time-filter / conjunction queries (~10%) call a tiny local Qwen3-1.5B. No cloud LLM ever sees a read.
This is the bit competitors can't match. Graphiti, Mem0, Letta all need an LLM at read time because their relation types are free-form text the SQL engine can't reason about.
9 entity types, 10 relations, hardcoded. Users never write a schema.
| Entity types | Relation types |
|---|---|
| Person, Place, Organization, Concept, Artifact, Event, Time, Idea, Fact | mentions, located_at, related_to, caused, preceded, references, owned_by, part_of, depends_on, participated_in |
Broad enough to handle personal wikis, work notes, research, recipes, code, journal entries — anything text-shaped. Edge-case domains (recipes need "Ingredient", scientific datasets need "Measurement") get handled by an automatic schema-improvement loop: the LLM logs suggestions to a side-table; a nightly cron promotes types that show up across ≥ 5 distinct episodes with cosine-similarity < 0.85 to any existing type. Users see this as a one-line notice the next time they invoke the CLI.
Full schema rationale → docs/CLARITY.md § 3
You pick one at ctxgraph init. All three keep reads local.
| Mode | Writes | Cost / 1k eps | Best for |
|---|---|---|---|
local-only |
GLiNER2 → NuExtract 2.0 → Qwen3-8B (all local) | $0 | Privacy / offline / sensitive data |
cloud-fallback (default) |
Local first; Cerebras free tier when local is stuck | $0 in practice* | Most users |
cloud-quality |
Skip local; every episode goes to Cerebras Qwen3-32B or DeepInfra Gemma-4-26B-A4B | $0–$0.11 | Long-form text, research papers |
* Cerebras free tier = 1M tokens/day, 30 RPM. Enough for ~1 250 episodes/day. DeepInfra Gemma-4-26B-A4B ($0.07 in / $0.34 out, ~$0.11/1k eps) is the paid fallback when Cerebras rate-limits.
allow_cloud = false in ~/.ctxgraph/config.toml forces Mode A regardless of mode — the privacy override.
| ctxgraph | Graphiti / Zep | Mem0 | Letta | Cognee | |
|---|---|---|---|---|---|
| Distribution | single Rust binary | Python + Neo4j + Docker | Python SDK | Python | Python + Neo4j |
| Local-only mode | yes | no | no | yes (Apache 2.0) | no |
| LLM calls per write | 1 | 6 | N | varies | varies |
| LLM in read path | no (90% of queries) | yes | yes | yes | yes |
| Schema-typed extraction | yes (universal 9/10) | free-form verbs | free-form | typed but manual | typed but manual |
| Bi-temporal edges | yes | yes | no | no | no |
| Verified $/1k eps (Gemma 4 26B) | $0.11 | ~$0.66 (6×) | N/A | N/A | N/A |
| Apples-to-apples combined F1 vs ctxgraph (same model) | 0.687 | 0.460 | not measured | not measured | not measured |
| Stars (rough, May 2026) | early | ~20K | ~50K | ~30K | ~15K |
More competitor analysis → docs/ROADMAP.md § "Competitive landscape"
| Component | Status | Lines |
|---|---|---|
ctxgraph-core — SQLite + FTS5 + bi-temporal graph |
shipped | ~2 000 |
ctxgraph-extract — tiered extraction (current: GLiNER + GLiREL + LLM gate) |
shipped | ~8 500 |
ctxgraph-embed — fastembed wrapper, all-MiniLM-L6-v2 (384-dim) |
shipped | ~70 |
ctxgraph-cli — init, log, query, entities, stats, models, mcp start |
shipped | ~600 |
ctxgraph-mcp — MCP server, 6 tools |
shipped | ~870 |
v0.3 is the next launch — see docs/ROADMAP.md. It swaps GLiNER + GLiREL for GLiNER2 (single forward pass), adopts the universal schema, adds the no-LLM read path, defaults to Cerebras free tier, and re-runs the 29-episode benchmark to confirm the headline lands at ≥ 0.745 combined F1 with a fully local stack.
# macOS + Linux (prebuilt binaries via Homebrew)
brew install rohansx/tap/ctxgraph
# or from source (Rust 1.85+)
cargo install ctxgraph-clictxgraph init
ctxgraph log "Alice chose PostgreSQL over MySQL for the new billing service."
ctxgraph log "PostgreSQL replaced MySQL in prod on 2026-04-12."
ctxgraph query "what did Alice choose?"
ctxgraph query "what was replaced?"{
"mcpServers": {
"ctxgraph": { "command": "ctxgraph-mcp" }
}
}| Tool | Description |
|---|---|
ctxgraph_add_episode |
Record an event or decision |
ctxgraph_search |
Fused FTS5 + semantic + graph search |
ctxgraph_traverse |
Walk the graph from an entity |
ctxgraph_find_precedents |
Find similar past events |
ctxgraph_list_entities |
List entities with filters |
ctxgraph_export_graph |
Export entities and edges |
use ctxgraph::{Graph, Episode};
let mut graph = Graph::init(".ctxgraph")?;
graph.add_episode(
Episode::builder("Chose Postgres over Mongo for the billing rewrite").build()
)?;
let results = graph.search("why Postgres?", 10)?;crates/
├── ctxgraph-core/ types, storage, query, temporal
├── ctxgraph-extract/ tiered extraction (ONNX + LLM)
├── ctxgraph-embed/ local embeddings (384-dim)
├── ctxgraph-cli/ CLI binary
└── ctxgraph-mcp/ MCP server
export OPENROUTER_API_KEY=sk-or-...
# 1) Third-party accuracy on CoNLL04 (auto-fetches the dataset; strict directional+typed scorer)
python scripts/conll04_bench.py --model google/gemini-2.5-flash-lite --out conll04.json --limit 80
# …or against a LOCAL model via ollama (no API cost):
python scripts/conll04_bench.py --model 'hf.co/<your>/gemma-gguf:Q4_K_M' \
--base-url http://localhost:11434/v1/chat/completions --out conll04_local.json --limit 40
# 2) Cross-domain model bake-off (ctxgraph single-call prompt)
python scripts/openrouter_bench.py --model deepseek/deepseek-v3.2 --out bench.json \
--skip-tech --cd-fixture crates/ctxgraph-extract/tests/fixtures/cross_domain_v2.json
# 3) ctxgraph-vs-Graphiti, same model, same scorer (needs Neo4j + graphiti venv)
docker run -d --name neo4j-bench -p 7687:7687 -e NEO4J_AUTH=neo4j/benchpass123 neo4j:5.26
python3 -m venv .venv-graphiti && .venv-graphiti/bin/pip install graphiti-core neo4j fastembed
.venv-graphiti/bin/python scripts/graphiti_openrouter_bench.py \
--model google/gemini-2.5-flash-lite --out graphiti.json
# 4) Cost/efficiency: measure Graphiti's ACTUAL LLM calls/episode vs ctxgraph's 1
.venv-graphiti/bin/python scripts/cost_efficiency_bench.py --model google/gemini-2.5-flash-liteEach model run costs ~$0.005–0.02 on OpenRouter; the CoNLL04 dataset is fetched from HuggingFace at run time (no third-party data committed to the repo).
See CONTRIBUTING.md. For design discussions, docs/CLARITY.md is the working doc — propose changes against it.
MIT