Skip to content

fix: high/pre-mainnet issues — query scoping (#184/#675), curator gate (#757), skill ACL (#462), async honesty (#1013)#1132

Merged
branarakic merged 7 commits into
mainfrom
fix/high-pre-mainnet-issues
Jun 18, 2026
Merged

fix: high/pre-mainnet issues — query scoping (#184/#675), curator gate (#757), skill ACL (#462), async honesty (#1013)#1132
branarakic merged 7 commits into
mainfrom
fix/high-pre-mainnet-issues

Conversation

@Bojan131

Copy link
Copy Markdown
Contributor

Summary

Fixes for high / pre-mainnet issues, branched from a fresh main. Every fix in this PR was verified on a live 6-node devnet (real chain, real data) and/or a unit test — no mocks for the behaviour under test.

This is the first batch of the 25-high effort. The other 9 highs are fixed on PR #1107 (fix-in-flight); this PR adds 5 more. The remaining deep contract/consensus/storage/P2P issues are tracked separately (see "Not in this PR" below) — they need focused, reviewed PRs rather than being rushed in here.

Fixes (5) — all verified

#184 + #675 — sub-graph scoping under view-based routing (packages/query)

  • view: working-memory (and SWM/VM) now includes data in registered sub-graphs (was silently excluded), and view + subGraphName now scopes to that sub-graph instead of throwing deferred to V10.x.
  • resolveViewGraphs threads a /{sub} segment into every per-layer prefix; queryWithView fans out across registered sub-graphs (from the _meta registry) when none is named.
  • Devnet-verified: WM view returns both root + sub-graph entities; view+subGraphName returns only the sub-graph's. Unit: sub-graph-query.test.ts. Query suite 262/262.

#757 — curator-gate the join-request endpoints (packages/agent, packages/cli)

  • listPendingJoinRequests now calls assertContextGraphOwner (the same check approve/reject already use), and the GET route maps the owner failure to 403. Closes "any valid token can read another curator's pending-moderation data".
  • Devnet-verified: a freshly-registered non-curator token → 403; the curator → 200 (not broken).

#462skill_request authorization (packages/agent, packages/cli)

  • PROTOCOL_MESSAGE authenticated the caller but did no authorization — any connected peer could invoke any registered skill. Added a SkillAclCheck hook to MessageHandler + agent.setSkillAcl(...); the daemon installs a default-deny-for-remote-peers policy. Operators opt back in with messaging.openSkills: true or messaging.skillAllowedPeers: [...].
  • Devnet-verified: a remote skill_request is default-denied with a clear reason. Agent e2e-network 11/11 (chat unaffected).

#1013 — async publish on-chain honesty (packages/publisher)

  • A private publishAsync that couldn't reach its chain-registered CG (no collectable storage ACKs) no longer reports finalized with a provisional t… UAL. It now fails honestly (the data is still staged locally under the provisional UAL, surfaced in the error). Threaded localChainSkipReason through PublishResult so a genuine no-chain publish still finalizes(local).
  • Unit: async-lift-local-finalization-honesty.test.ts (4/4). Publisher suite 1169/1169. (Actually reaching chain for private CGs is publishAsync should support encrypted VM publishing for curated/private context graphs #1121.)

Not in this PR (honest status)

These highs need a focused, reviewed PR — rushing them risks exactly the pre-mainnet breakage we're avoiding, and I won't claim verification I can't stand behind:

🤖 Generated with Claude Code

…L, async honesty)

All verified on a live 6-node devnet (real chain, real data) and/or unit tests.

#184 + #675 (packages/query) — view-based routing now scopes to / includes
  sub-graphs. resolveViewGraphs threads a `subGraphName` segment into the
  per-layer prefixes; queryWithView fans out over registered sub-graphs when
  none is named. Removed the "deferred to V10.x" throw.
  Devnet: WM view returns root+subgraph; view+subGraphName scopes to the subgraph.

#757 (packages/agent, packages/cli) — GET /join-requests is now curator-gated
  server-side (listPendingJoinRequests calls assertContextGraphOwner; the route
  maps the owner failure to 403, mirroring approve/reject).
  Devnet: non-curator token -> 403, curator -> 200.

#462 (packages/agent, packages/cli) — skill_request authorization. Added a
  SkillAclCheck hook to MessageHandler + setSkillAcl on DKGAgent; the daemon
  installs a default-deny-for-remote-peers policy (opt back in via
  messaging.openSkills / messaging.skillAllowedPeers). Closes "any connected
  peer could invoke any registered skill".
  Devnet: remote skill_request default-denied.

#1013 (packages/publisher) — async publish honesty. A private publish that
  couldn't reach its chain-registered CG (no collectable storage ACKs) no longer
  reports `finalized` with a provisional UAL — it fails honestly (data still
  staged locally under the provisional UAL, surfaced in the error). Threaded
  `localChainSkipReason` through PublishResult so a genuine no-chain publish
  still finalizes(local). (Reaching chain for private CGs is #1121.)
  Unit: async-lift-local-finalization-honesty.test.ts (4/4).

Regression: query 262/262, publisher 1169/1169, agent e2e-network 11/11.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread packages/agent/src/dkg-agent-join.ts
Comment thread packages/query/src/dkg-query-engine.ts
The async lift rewrote caller-provided root IRIs to a generated
`dkg:<cg>:<ns>:<scope>/<name>-<hash>` form, while the synchronous publish
path (`canonicalPublishPayload` → `skolemizeByEntity`) keeps the caller's
`rootEntity` IRIs verbatim. The divergence broke stable IRI linking: the
same domain payload produced different RDF subjects depending on sync vs
async, so VM graphs rendered disconnected and cross-entity references
couldn't be followed.

Fix: `canonicalRootIri` is now identity — the async lift preserves the
caller root IRI exactly like sync. Every downstream consumer reads
`validation.canonicalRootMap` symmetrically, so an identity map propagates
cleanly: quad rewriting is a no-op, private data is stored at the caller
root, and the canonical-vs-source `privateDataAnchor` bridge (an async-only
artifact of the old rewrite that sync never created) is correctly skipped
(`stampCanonicalAnchorsInWorkspace` becomes a no-op via its
`canonical === sourceRoot` guard).

Tests:
- New `async-lift-canonicalization-parity.test.ts` asserts identity
  canonicalization and that cross-entity IRI links survive validation.
- The two CREATE-remainder subtraction tests now seed authoritative VM
  state through a SEPARATE publisher instance. Rule-4 entity exclusivity
  is tracked per-process in memory (never hydrated from the store), so this
  models the real cross-node idempotency case the subtraction guards (node A
  finalized R; node B shares R and lifts a CREATE, and subtraction drops the
  already-finalized quads) — instead of relying on the old root rewrite to
  dodge the same-instance collision.

Verified on a live devnet node: async publish (publishFromFinalizedAssertion,
vm-confirmed, real on-chain tx) writes the verbatim caller IRI
`urn:dmaast:device:async-1122` into `_verifiable_memory`, with `_meta`
`entity` and `canonicalRootMap` both the caller IRI and zero `dkg:` rewrite.
Publisher suite green (1171 passed).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread packages/agent/src/dkg-agent-join.ts
Comment thread packages/query/src/dkg-query-engine.ts Outdated
Comment thread packages/query/src/dkg-query-engine.ts
Codex red 1 — listPendingJoinRequests callers (GH #757 follow-up):
- notifications route now passes the token-verified caller into the
  curator-gated listPendingJoinRequests; without it the gate resolved
  against the node default agent and silently emptied the pending-join
  set for any non-default curator.
- isCallerOrNodeOwner compares EVM-address DIDs case-insensitively
  (EIP-55 checksums are display-only; owner DIDs are stored as written
  while HTTP callers pass lowercased addresses). Peer IDs stay exact.
- Tests: notifications route asserts the caller is threaded (non-default
  curator keeps its pending joins); agent test proves lowercased curator
  is accepted and non-curators/default-agent stay rejected.

Codex red 2 — by-name WM read is now sub-graph aware (GH #184 follow-up):
- resolveViewGraphs threads opts.subGraphName into the single-graph
  contextGraphLayerUri/contextGraphAssertionUri (mirrors the writer,
  DKGPublisher.wmGraphUri).
- resolveWorkingMemoryKaNumber keys the dkg:kaId lookup by the
  sub-graph-aware lifecycle URN (root _meta, like assertionFinalize).
- Tests: per-KA sub-graph read via kaId stamp, legacy name-keyed
  fallback, and no root-assertion leak into sub-graph reads.

Also:
- messaging-chat-acl.test.ts: 6 new GH #462 skill_request ACL tests
  (real Ed25519/X25519 round-trips): deny blocks handler, throw fails
  closed, denial precedes unknown-skill resolution (no existence
  oracle), accept/null restore paths.
- publisher-route-snapshot.test.ts: align with GH #1122 caller-IRI
  parity (payload carries verbatim caller subject, no dkg:<cg>: rewrite).

All verified on a live 6-node devnet: #184/#675/by-name (3-way PASS),
#757 four-way PASS incl. non-default-curator notifications, #1122
sync/async VM parity (adjacent KAs, verbatim caller IRIs), #1013 EPCIS
private capture fails honestly with the #1013 error instead of fake
finalization.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread packages/publisher/src/async-lift-publish-result.ts
Comment thread packages/publisher/src/async-lift-validation.ts
CI on the #1122 commit surfaced three more test sites that encoded the
OLD async-lift behavior:

- agent/swm-snapshot-sync: prepared lift payload now carries the verbatim
  caller subjects (root + skolem child), not the dkg:<cg>:… rewrite.
- agent/publish-jsonld (async subtraction observe): seed the confirmed
  authoritative state at the CALLER root — under identity canonicalization
  that IS the canonical root; the seal short-circuit math is unchanged.
- kafka-plugin e2e: create the test CGs local-only (register: false). The
  plugin registers streams as fully-private KAs and the harness is one
  isolated daemon — a chain-registered CG can never reach the private-ACK
  quorum there, so the old run only passed through the fake local
  finalization #1013 removed. A chainless CG finalizes locally as an
  honest terminal state, preserving the register→list→get coverage.

Local runs: agent swm-snapshot-sync + publish-jsonld 36/36,
kafka-plugin e2e 11/11.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread packages/query/src/dkg-query-engine.ts Outdated
Comment thread packages/query/src/dkg-query-engine.ts Outdated
Comment thread packages/publisher/src/async-lift-validation.ts
Comment thread packages/publisher/src/async-lift-publish-result.ts
Bojan131 added a commit that referenced this pull request Jun 12, 2026
… red-while-live convention

Adds single-process / single-Hardhat-node reproducing tests (run in the normal
CI lanes) for five high/pre-mainnet issues that were previously only documented
`it.skip` stubs, and switches the whole liveness suite to the standard
"red while the bug is live, green when fixed" convention (plain failing `it()`
instead of the inverted `it.fails`, which was green-while-broken).

New CI tests (each authored against a known-fixed build and confirmed to flip
green there, so a red is a genuine live bug, not a broken test):
- #462  agent/issue-462-skill-acl.test.ts — an unauthorized (but signed) peer's
  skill_request is rejected and the handler does not run. Today there is no ACL
  → handler runs → RED.
- #936  agent/issue-936-tokenid-determinism.test.ts — two replicas reconciling
  the same multi-root KC from chain (divergent oxigraph insertion orders) agree
  on the rootEntity→tokenId mapping. Today positional assignment over a
  store-dependent order makes them disagree → RED.
- #1013 publisher/issue-1013-async-finalization-honesty.test.ts — a private
  publish that never reached chain (no storage ACKs) must NOT map to a finalized
  lift job. Today the mapper returns finalized/local → RED.
- #1078 storage/issue-1078-private-layer-scope.test.ts — a root hydrates only the
  authoritative private slice, not a superseded commitment for the same root.
  Today the CG-level _private graph commingles both → RED.
- #1091 random-sampling/e2e-hardhat-chain.test.ts — a node cannot predict its own
  RS challenge from public block data. Today the seed is reconstructed from
  block.difficulty/blockhash/sender and previewChallengeForSeed predicts the
  exact on-chain draw → RED.

Convention flip (it.fails → it()) for the existing high-issue repros
(#11, #184, #675, #757, #1121, #1122 + the devnet multi-node tier) so the suite
is uniformly RED while bugs are live and GREEN once fixed — matching how the fix
PRs (#1107, #1132) turn individual tests green as they merge.

Doc rewritten (docs/testing/ISSUE_LIVENESS_TESTS.md): all 25 high issues mapped
to a test across three tiers — 11 CI unit/integration, 8 devnet multi-node, and
6 honest pending-fixture/emergent stubs (#614 #1099 #1124 fixture-needed; #723
#999 #1008 emergent/load — a deterministic CI assertion there would be a false
positive).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Bojan131 and others added 3 commits June 18, 2026 09:38
…issues

# Conflicts:
#	packages/cli/test/notifications-route.test.ts
#	packages/query/src/dkg-query-engine.ts
…test

The merge of main into this branch produced 2 conflicts (resolved in the
merge commit) plus one auto-merge artifact this commit fixes.

Conflict resolutions (in the merge commit):
- packages/query/src/dkg-query-engine.ts (working-memory view): combined
  main's same-identity alias span (PR #1107 review 🟡 — one prefix per
  agentAddressAlias) with #1132's sub-graph scoping (#184/#675 — the `${sg}`
  suffix), so the WM prefixes are `…${sg}/_working_memory/${addr}/` per alias.
- packages/cli/test/notifications-route.test.ts: took main's version. main
  rewrote it from a mock-based unit test into a real-daemon integration test
  (sign-join→request-join→curator reject-join) that fully exercises the #757
  curator gate; the PR's #757 route change (thread callerAddress into
  listPendingJoinRequests) auto-merged into notifications.ts and is covered.

This commit:
- packages/agent/test/messaging-chat-acl.test.ts: add `vi` to the vitest
  import. main's copy of this file (chat-ACL tests only) and the PR's net-new
  `skill_request ACL (GH #462)` describe block (which uses `vi.fn` via
  echoSkill) auto-merged, but the surviving import line was main's
  `{ describe, it, expect }` — so the #462 skill-ACL tests threw
  `ReferenceError: vi is not defined`. The #462 feature itself
  (MessageHandler.setSkillAcl + default-deny enforcement) is intact in the
  merged messaging.ts; only the test import needed reconciling. 15/15 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…on tests)

Addresses the unresolved 🔴 Codex review findings, re-assessed against the
current code (post 225-commit main merge). Four were real and present; three
were already-fixed or not-applicable (resolved on the PR with explanation).

REAL BUGS FIXED:

- query VM view drops the sub-graph ROOT graph (dkg-query-engine.ts): the
  verifiable-memory case returned `graphs: []` when subGraphName was set, so
  it only searched `…/{sub}/_verifiable_memory/*` and missed confirmed /
  intentional-local sub-graph data written to `did:dkg:context-graph:{cg}/{sub}`.
  Now includes the sub-graph root graph, mirroring the root-CG branch.

- query sub-graph fan-out ignored `verifiedGraph` (dkg-query-engine.ts): a
  single-graph `view:'verifiable-memory' + verifiedGraph` read still fanned out
  across every registered sub-graph's VM partition and returned unrelated rows.
  Skip the fan-out when verifiedGraph is set (it is already pinned to one graph).

- async-lift `private-no-acks` retried forever (lift-job-failures.ts,
  async-lift-publish-result.ts): a deterministic "private payload had no
  collectable storage ACKs" broadcast failure was classified as the default
  retryable `rpc_unavailable`, so the queue reset/retried a job that can never
  finalize until #1121. Added a terminal failure code `private_unanchorable`
  (broadcast / terminal / fail_job) and classify the message to it.

- mixed public+private async root lost its `dkg:privateDataAnchor`
  (dkg-agent-helpers.ts): `partitionPublishAsyncQuads` only anchored
  private-ONLY roots, and after the #1122 canonicalRootIri→identity flip
  `stampCanonicalAnchorsInWorkspace` self-disabled — so mixed roots' private
  data disappeared from EPCIS/Kafka partition readers (which bridge
  public→private via the anchor). Now anchors every privately-staged root
  (idempotent).

ALREADY-FIXED / NOT-APPLICABLE (resolved on the PR, no code change):
- WM by-name path sub-graph awareness — already fixed in d65e09f
  (resolveViewGraphs + resolveWorkingMemoryKaNumber thread subGraphName).
- listPendingJoinRequests caller threading — every production call site
  (notifications + context-graph routes) passes callerAgentAddress; omitting it
  throws loudly (intended secure gate), never silently empties the set.
- canonicalRootIri identity legacy-migration — N/A pre-mainnet: the async-lift
  publisher + the old generated-root rewrite are both unreleased, so no store
  persisted the legacy `dkg:<cg>:…` root format to mismatch on upgrade.

Regression tests added: query sub-graph VM scoping (A,C), async-lift terminal
classification (D), partitionPublishAsyncQuads anchor coverage (F).
query 277, publisher 1177, agent 1612 — all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants