feat: add generic imported artifact byte reads by lupuszr · Pull Request #1058 · OriginTrail/dkg

lupuszr · 2026-06-08T14:08:04Z

Summary

Implements a generic imported assertion artifact byte API that supports source, markdown, and original artifact kinds without introducing a separate authorization system.

The artifact byte path now reuses the existing context graph sync authorization stack:

buildSyncRequestEnvelope
parseSyncRequest
authorizeSyncRequest
existing private sync identity, delegation, participant, allowed-peer, and curator-refresh logic

Artifact-specific code only validates request shape, auth-envelope binding, metadata linkage, hash consistency, and bounded byte serving.

What Changed

Added POST /api/assertion/artifact/read
Added generic peer protocol /dkg/10.0.2/get-assertion-artifact
Added signed sync-envelope fields:
- authPurpose
- authSelector
Added generic artifact selector binding for imported artifact byte requests
Added bounded FileStore.stat() and FileStore.readRange()
Added requester-side paged fetch and hash verification before cache promotion
Reused existing imported artifact metadata resolution for:
- sourceFileHash
- mdIntermediateHash
- Markdown fallback for native text/markdown
- durable _meta
- SWM/shared-memory fallback
Preserved existing /api/assertion/import-artifact/read-markdown behavior

Authorization Model

No new source-blob-specific authorization flow was added.

The responder:

Parses the signed sync request envelope.
Verifies the envelope is bound to the artifact request.
Calls existing authorizeSyncRequest.
Only after authorization, resolves artifact metadata linkage and serves bounded bytes.

Transport

Retains the useful transport ideas from PR #1043:

offset
maxBytes
bounded page reads
totalBytes
nextOffset
truncated
hash verification
cache promotion only after full artifact verification

Tests

pnpm --filter @origintrail-official/dkg-agent exec vitest run --config vitest.unit.config.ts test/imported-artifact.test.ts test/sync-envelope-cursor.test.ts
pnpm --filter @origintrail-official/dkg exec vitest run test/import-artifact-routes.test.ts

Notes

sourcePeerId is currently required for remote peer fetches; source-agent-to-peer discovery can be added as a follow-up.
original currently resolves to the linked source bytes unless distinct original-artifact metadata is added later.

+      const info = await handle.stat();
+      if (offset >= info.size) return Buffer.alloc(0);
+      const cappedLength = Math.min(length, info.size - offset);
+      const buffer = Buffer.alloc(cappedLength);


github-actions · 2026-06-08T14:10:54Z

+        ? (requestedHash ? { fileHash: requestedHash } : {})
+        : {}),
+    },
+    undefined,


🔴 Bug: resolveAssertionArtifact() drops the existing owner guard by passing undefined here. That means the new /api/assertion/artifact/read route can read another agent's imported bytes locally just by providing its assertionUri, even though the older resolve/read-markdown paths still bind reads to requestAgentAddress (with the public-open relaxation). Thread the same owner-guard policy through this path before resolving the artifact.

github-actions · 2026-06-08T14:10:54Z

+      return denied(req);
+    }
+
+    const authorized = await this.authorizeSyncRequest(syncReq, fromPeerId);


🔴 Bug: authorizeSyncRequest() only proves the caller may sync this context graph; it does not enforce the imported-artifact owner rule that the HTTP routes already rely on. As written, any peer that can sync the CG can call this protocol directly and fetch another agent's source/markdown bytes. Resolve the assertion owner here and apply the same owner/public-open policy before serving the blob.

github-actions · 2026-06-08T14:10:54Z

+      includeSharedMemory: false,
+      targetPeerId: params.sourcePeerId,
+      requesterPeerId: this.peerId,
+      needsAuth: true,


🔴 Bug: forcing needsAuth: true makes public artifact reads require a local signing key. On nodes without a chain identity or claimed-agent private key, buildSyncRequestEnvelope() will throw before the request is sent, even though handleGetImportedArtifact() currently accepts public-CG requests via authorizeSyncRequest(). Compute whether the target CG actually needs auth and only require a signed envelope for private/curated fetches, or make the responder reject unsigned public requests consistently.

github-actions · 2026-06-08T16:04:45Z

+      return denied(req);
+    }
+
+    const authorized = await this.authorizeSyncRequest(syncReq, fromPeerId);


🔴 Bug: authorizeSyncRequest() only gates private context graphs. On public-but-not-open graphs this will return true for any peer-bound envelope, so the new P2P artifact route can bypass the owner/public-open checks that /api/assertion/artifact/read still enforces. Add the same read-policy check here before resolving metadata/bytes, otherwise imported artifacts from non-open public CGs become remotely readable.

github-actions · 2026-06-08T16:04:45Z

+    }
+    const chunks = [Buffer.from(first.bytesB64, 'base64')];
+    let nextOffset = first.nextOffset;
+    while (first.truncated && nextOffset != null && nextOffset < total) {


🔴 Bug: this loop never verifies that the responder made forward progress. A peer that keeps replying with truncated: true and the same nextOffset (or zero-byte pages) will keep this caller in an endless fetch loop. Validate that each page advances the offset and abort on repeated/non-increasing nextOffset or inconsistent page metadata.

github-actions · 2026-06-08T16:04:45Z

+    ? mdIntermediateHash ?? (contentType === 'text/markdown' ? fileHash : undefined)
+    : fileHash;
+  if (!linkedHash) return null;
+  if (args.hash && args.hash !== linkedHash) return 'hash_mismatch';


🟡 Issue: validateContentHash() accepts bare 64-hex hashes, but this comparison requires the exact canonical string. A caller that sends a bare hash will be told hash_mismatch even when it refers to the same blob. Either normalize hashes before comparing or reject bare hashes at the API boundary so validation matches runtime behavior.

github-actions · 2026-06-08T16:19:05Z

  largeLiteralStorage?: LargeLiteralStorageConfig;
  /** Out-of-Oxigraph immutable public SWM operation snapshots. Defaults on when dataDir is set. */
  sharedMemoryPublicSnapshotStorage?: SharedMemoryPublicSnapshotStorageConfig;
+  importedArtifactByteStore?: ImportedArtifactByteStore;


🔴 Bug: exposing importedArtifactByteStore on DKGAgentConfig is misleading right now because DKGAgent.create() never consumes it or registers the peer handler. Non-CLI embedders can pass this config and still fail every get-assertion-artifact request until they manually call registerImportedArtifactByteStore(). Either wire the store during agent construction or keep it off the public config surface.

github-actions · 2026-06-08T16:19:05Z

+    ctx,
+    {
+      ...raw,
+      ...(kind === 'source' || kind === 'original'


🔴 Bug: threading requestedHash into fileHash here preserves the old 400 fileHash does not match import metadata path for source/original, while markdown mismatches return the new 200 { status: "hash_mismatch" } shape. That makes local artifact reads inconsistent by kind and with the peer protocol. Resolve first, then compare requestedHash against sourceFileHash so all kinds report hash_mismatch uniformly.

github-actions · 2026-06-08T16:19:05Z

+      return { response: first };
+    }
+    const total = first.totalBytes;
+    if (!params.cache || total == null || total > IMPORTED_ARTIFACT_MAX_CACHE_BYTES) {


🔴 Bug: this returns remote bytes without hashing them whenever cache is false, the request starts at a nonzero offset, or the artifact is over the cache cap. The HTTP route then serves those unverified bytes back to callers as a normal success path. Verify full-file responses even when you are not caching, and avoid returning partial remote pages as trusted bytes unless you can prove their integrity.

github-actions · 2026-06-08T16:19:05Z

+        return { response: page };
+      }
+      chunks.push(Buffer.from(page.bytesB64, 'base64'));
+      nextOffset = page.nextOffset;


🔴 Bug: there is no forward-progress check in this pagination loop. A buggy or malicious peer can keep replying with truncated: true and the same nextOffset, which makes this spin forever and hang the caller. Validate that each page starts at the requested offset and that nextOffset increases strictly before continuing.

github-actions · 2026-06-09T07:04:24Z

+      return denied(req);
+    }
+
+    const authorized = await this.authorizeSyncRequest(syncReq, fromPeerId);


🔴 Bug: authorizeSyncRequest() only verifies the envelope for private context graphs. On public-but-not-open graphs it returns true without validating requesterAgentAddress, but the next line still uses that unverified field for the owner check. A peer can forge requesterAgentAddress here and read another agent's imported artifact. Either verify the signed binding whenever owner-gating depends on requesterAgentAddress, or reject owner-gated peer reads unless the envelope was actually authenticated.

github-actions · 2026-06-09T07:04:24Z

+  const durable = await agent.store.query(`
+    SELECT ?fileHash ?contentType ?extractionStatus ?structuralTripleCount ?mdIntermediateHash WHERE {
+      GRAPH <${contextGraphMetaUri(args.contextGraphId)}> {
+        <${args.assertionUri}> <${DKG_ONTOLOGY}sourceFileHash> ?fileHash .


🔴 Bug: this lookup uses args.assertionUri verbatim. The rest of the import-artifact flow still accepts legacy ownerless assertion URIs and derives subGraphName from the canonical URI, but this protocol never normalizes before querying _meta or SWM. Requests with a supported legacy URI, or a sub-graph assertion where the caller omitted subGraphName, will miss existing metadata and incorrectly return unavailable. Parse/canonicalize the assertion URI once and reuse the derived URI + sub-graph for the lookups and selector binding.

Jurij89

I found blocking gaps against the stated goal of allowing joined/allowlisted Context Graph participants to open imported Markdown/source bytes. I also avoided duplicating the older unresolved auth/CodeQL review threads.

Jurij89 · 2026-06-10T11:12:53Z

+    throw new ImportArtifactRouteError(400, 'Invalid hash');
+  }
+
+  const artifact = await resolveImportedArtifactForRead(


Blocking: this route still resolves the artifact through the owner/public-open read guard. For a curated CG member in the allowList, ctx.requestAgentAddress is allowed to read/sync the graph but is not the assertion owner, and isPublicOpenContextGraph() is false, so this throws 403 before the SWM fallback can resolve the promoted Markdown/source linkage. That is the exact scenario this PR is supposed to enable. Please make the read policy accept authorized curated participants while keeping write/enrichment paths owner-only, and add a regression for an allowlisted non-owner reading kind: "markdown"/source on a private CG.

Jurij89 · 2026-06-10T11:12:53Z

+
+    const authorized = await this.authorizeSyncRequest(syncReq, fromPeerId);
+    if (!authorized) return denied(req);
+    if (!await canReadImportedArtifactByOwnerPolicy(this, req, syncReq)) return denied(req);


Blocking: the same access-policy issue exists on the P2P responder. authorizeSyncRequest() can return true for a private/curated CG when the requester is a valid participant/delegate, but canReadImportedArtifactByOwnerPolicy() then reduces the policy back to owner-or-public-open only. A joined allowlisted agent can sync the triples but still gets denied for the actual bytes. The artifact byte gate should preserve the signed selector-bound auth check and allow non-owner private CG participants that passed sync authorization, instead of requiring assertion ownership.

Jurij89 · 2026-06-10T11:12:53Z

+        }
+      }
+
+      const sourcePeerId = typeof raw.sourcePeerId === 'string' && raw.sourcePeerId.trim()


Blocking for end-to-end hydration: the route requires the caller to already know sourcePeerId, but when it is missing the response only returns source.agentAddress. A normal SWM reader can resolve the assertion owner and artifact hash from replicated triples, but not the source libp2p peer id, so this returns fetchable without enough information to make the next request. PR #1043 had durable/SWM publisher-peer plus agent-registry candidate resolution and retry; PR #1058 drops that discovery path. Please either resolve/retry candidate source peers server-side or surface persisted sourcePeerIds in artifact metadata so joined readers can actually open the MD file.

Jurij89 · 2026-06-10T11:15:11Z

+        });
+      }
+
+      const statResult = await fileStore.stat(resolved.hash);


Blocking: this local hit is not scoped to the assertion or to a verified remote fetch. For owner self-reads that is fine, but for cross-agent/SWM fallback reads the file store is a global content-addressed cache: if this node happens to already have resolved.hash from another import/context, the route returns those bytes without proving they came from this artifact or from the source peer. PR #1043 avoided that by requiring a verified cache marker after full-hash remote hydration before reusing local bytes for owner-relaxed SWM reads. Please carry over that protection for non-owner/shared-memory artifact reads, otherwise this becomes a byte oracle for any linked hash already present on the node.

github-actions · 2026-06-16T11:20:37Z

+  if (!resolvedHash) {
+    throw new ImportArtifactRouteError(404, `No ${kind} artifact is linked to assertionUri`);
+  }
+  if (requestedHash && requestedHash !== resolvedHash) return 'hash_mismatch';


🔴 Bug: validateContentHash() accepts bare 64-hex hashes, but this exact-string comparison only matches the prefixed form stored in metadata. A caller sending a valid bare hash will get a false hash_mismatch here (and the same pattern exists in the peer read path). Canonicalize both hashes before comparing, or tighten validation to reject bare hashes.

github-actions · 2026-06-16T11:20:37Z

+    }
+
+    return jsonResponse(res, 200, {
+      status: remote.verifiedBytes ? 'fetched' : 'fetchable',


🟡 Issue: remote.verifiedBytes is set even when cache === false, so this reports status: "fetched" for responses that were verified but never written to the local file store. That makes the status lie about local availability. Base the status on whether the cache promotion actually ran, or introduce a separate status for "verified-but-not-cached".

github-actions · 2026-06-16T11:20:37Z

+      assertionUri: resolved.assertionUri,
+      kind: resolved.kind,
+      hash: resolved.hash,
+      contentType: page.contentType ?? resolved.contentType,


🟡 Issue: page.contentType comes from the remote peer and is not covered by the hash verification you do above. Returning it preferentially lets a buggy or malicious peer change the MIME type the HTTP client sees even when local metadata says otherwise. Prefer the locally-resolved contentType, or reject mismatches before responding.

github-advanced-security AI found potential problems Jun 8, 2026

View reviewed changes

Comment thread packages/cli/src/file-store.ts

const info = await handle.stat();

if (offset >= info.size) return Buffer.alloc(0);

const cappedLength = Math.min(length, info.size - offset);

const buffer = Buffer.alloc(cappedLength);

github-actions Bot reviewed Jun 8, 2026

View reviewed changes

lupuszr force-pushed the codex/generic-imported-artifact-bytes branch from 2c7f710 to fea30a5 Compare June 8, 2026 16:15

github-actions Bot reviewed Jun 8, 2026

View reviewed changes

lupuszr force-pushed the codex/generic-imported-artifact-bytes branch from fea30a5 to d1a4e88 Compare June 9, 2026 06:58

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Jurij89 requested changes Jun 10, 2026

View reviewed changes

Jurij89 reviewed Jun 10, 2026

View reviewed changes

lupuszr added 3 commits June 16, 2026 12:19

feat: add generic imported artifact byte reads

5d8dd1c

fix: tighten imported artifact selector binding

da66bde

fix: harden imported artifact byte reads

bbba0fe

lupuszr force-pushed the codex/generic-imported-artifact-bytes branch from d1a4e88 to bbba0fe Compare June 16, 2026 11:16

github-actions Bot reviewed Jun 16, 2026

View reviewed changes

Conversation

lupuszr commented Jun 8, 2026

Summary

What Changed

Authorization Model

Transport

Tests

Notes

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Jurij89 left a comment

Choose a reason for hiding this comment

Uh oh!

Jurij89 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Jurij89 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Jurij89 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Jurij89 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants