feat: add generic imported artifact byte reads#1058
Conversation
| const info = await handle.stat(); | ||
| if (offset >= info.size) return Buffer.alloc(0); | ||
| const cappedLength = Math.min(length, info.size - offset); | ||
| const buffer = Buffer.alloc(cappedLength); |
| ? (requestedHash ? { fileHash: requestedHash } : {}) | ||
| : {}), | ||
| }, | ||
| undefined, |
There was a problem hiding this comment.
🔴 Bug: resolveAssertionArtifact() drops the existing owner guard by passing undefined here. That means the new /api/assertion/artifact/read route can read another agent's imported bytes locally just by providing its assertionUri, even though the older resolve/read-markdown paths still bind reads to requestAgentAddress (with the public-open relaxation). Thread the same owner-guard policy through this path before resolving the artifact.
| return denied(req); | ||
| } | ||
|
|
||
| const authorized = await this.authorizeSyncRequest(syncReq, fromPeerId); |
There was a problem hiding this comment.
🔴 Bug: authorizeSyncRequest() only proves the caller may sync this context graph; it does not enforce the imported-artifact owner rule that the HTTP routes already rely on. As written, any peer that can sync the CG can call this protocol directly and fetch another agent's source/markdown bytes. Resolve the assertion owner here and apply the same owner/public-open policy before serving the blob.
| includeSharedMemory: false, | ||
| targetPeerId: params.sourcePeerId, | ||
| requesterPeerId: this.peerId, | ||
| needsAuth: true, |
There was a problem hiding this comment.
🔴 Bug: forcing needsAuth: true makes public artifact reads require a local signing key. On nodes without a chain identity or claimed-agent private key, buildSyncRequestEnvelope() will throw before the request is sent, even though handleGetImportedArtifact() currently accepts public-CG requests via authorizeSyncRequest(). Compute whether the target CG actually needs auth and only require a signed envelope for private/curated fetches, or make the responder reject unsigned public requests consistently.
| return denied(req); | ||
| } | ||
|
|
||
| const authorized = await this.authorizeSyncRequest(syncReq, fromPeerId); |
There was a problem hiding this comment.
🔴 Bug: authorizeSyncRequest() only gates private context graphs. On public-but-not-open graphs this will return true for any peer-bound envelope, so the new P2P artifact route can bypass the owner/public-open checks that /api/assertion/artifact/read still enforces. Add the same read-policy check here before resolving metadata/bytes, otherwise imported artifacts from non-open public CGs become remotely readable.
| } | ||
| const chunks = [Buffer.from(first.bytesB64, 'base64')]; | ||
| let nextOffset = first.nextOffset; | ||
| while (first.truncated && nextOffset != null && nextOffset < total) { |
There was a problem hiding this comment.
🔴 Bug: this loop never verifies that the responder made forward progress. A peer that keeps replying with truncated: true and the same nextOffset (or zero-byte pages) will keep this caller in an endless fetch loop. Validate that each page advances the offset and abort on repeated/non-increasing nextOffset or inconsistent page metadata.
| ? mdIntermediateHash ?? (contentType === 'text/markdown' ? fileHash : undefined) | ||
| : fileHash; | ||
| if (!linkedHash) return null; | ||
| if (args.hash && args.hash !== linkedHash) return 'hash_mismatch'; |
There was a problem hiding this comment.
🟡 Issue: validateContentHash() accepts bare 64-hex hashes, but this comparison requires the exact canonical string. A caller that sends a bare hash will be told hash_mismatch even when it refers to the same blob. Either normalize hashes before comparing or reject bare hashes at the API boundary so validation matches runtime behavior.
2c7f710 to
fea30a5
Compare
| largeLiteralStorage?: LargeLiteralStorageConfig; | ||
| /** Out-of-Oxigraph immutable public SWM operation snapshots. Defaults on when dataDir is set. */ | ||
| sharedMemoryPublicSnapshotStorage?: SharedMemoryPublicSnapshotStorageConfig; | ||
| importedArtifactByteStore?: ImportedArtifactByteStore; |
There was a problem hiding this comment.
🔴 Bug: exposing importedArtifactByteStore on DKGAgentConfig is misleading right now because DKGAgent.create() never consumes it or registers the peer handler. Non-CLI embedders can pass this config and still fail every get-assertion-artifact request until they manually call registerImportedArtifactByteStore(). Either wire the store during agent construction or keep it off the public config surface.
| ctx, | ||
| { | ||
| ...raw, | ||
| ...(kind === 'source' || kind === 'original' |
There was a problem hiding this comment.
🔴 Bug: threading requestedHash into fileHash here preserves the old 400 fileHash does not match import metadata path for source/original, while markdown mismatches return the new 200 { status: "hash_mismatch" } shape. That makes local artifact reads inconsistent by kind and with the peer protocol. Resolve first, then compare requestedHash against sourceFileHash so all kinds report hash_mismatch uniformly.
| return { response: first }; | ||
| } | ||
| const total = first.totalBytes; | ||
| if (!params.cache || total == null || total > IMPORTED_ARTIFACT_MAX_CACHE_BYTES) { |
There was a problem hiding this comment.
🔴 Bug: this returns remote bytes without hashing them whenever cache is false, the request starts at a nonzero offset, or the artifact is over the cache cap. The HTTP route then serves those unverified bytes back to callers as a normal success path. Verify full-file responses even when you are not caching, and avoid returning partial remote pages as trusted bytes unless you can prove their integrity.
| return { response: page }; | ||
| } | ||
| chunks.push(Buffer.from(page.bytesB64, 'base64')); | ||
| nextOffset = page.nextOffset; |
There was a problem hiding this comment.
🔴 Bug: there is no forward-progress check in this pagination loop. A buggy or malicious peer can keep replying with truncated: true and the same nextOffset, which makes this spin forever and hang the caller. Validate that each page starts at the requested offset and that nextOffset increases strictly before continuing.
fea30a5 to
d1a4e88
Compare
| return denied(req); | ||
| } | ||
|
|
||
| const authorized = await this.authorizeSyncRequest(syncReq, fromPeerId); |
There was a problem hiding this comment.
🔴 Bug: authorizeSyncRequest() only verifies the envelope for private context graphs. On public-but-not-open graphs it returns true without validating requesterAgentAddress, but the next line still uses that unverified field for the owner check. A peer can forge requesterAgentAddress here and read another agent's imported artifact. Either verify the signed binding whenever owner-gating depends on requesterAgentAddress, or reject owner-gated peer reads unless the envelope was actually authenticated.
| const durable = await agent.store.query(` | ||
| SELECT ?fileHash ?contentType ?extractionStatus ?structuralTripleCount ?mdIntermediateHash WHERE { | ||
| GRAPH <${contextGraphMetaUri(args.contextGraphId)}> { | ||
| <${args.assertionUri}> <${DKG_ONTOLOGY}sourceFileHash> ?fileHash . |
There was a problem hiding this comment.
🔴 Bug: this lookup uses args.assertionUri verbatim. The rest of the import-artifact flow still accepts legacy ownerless assertion URIs and derives subGraphName from the canonical URI, but this protocol never normalizes before querying _meta or SWM. Requests with a supported legacy URI, or a sub-graph assertion where the caller omitted subGraphName, will miss existing metadata and incorrectly return unavailable. Parse/canonicalize the assertion URI once and reuse the derived URI + sub-graph for the lookups and selector binding.
Jurij89
left a comment
There was a problem hiding this comment.
I found blocking gaps against the stated goal of allowing joined/allowlisted Context Graph participants to open imported Markdown/source bytes. I also avoided duplicating the older unresolved auth/CodeQL review threads.
| throw new ImportArtifactRouteError(400, 'Invalid hash'); | ||
| } | ||
|
|
||
| const artifact = await resolveImportedArtifactForRead( |
There was a problem hiding this comment.
Blocking: this route still resolves the artifact through the owner/public-open read guard. For a curated CG member in the allowList, ctx.requestAgentAddress is allowed to read/sync the graph but is not the assertion owner, and isPublicOpenContextGraph() is false, so this throws 403 before the SWM fallback can resolve the promoted Markdown/source linkage. That is the exact scenario this PR is supposed to enable. Please make the read policy accept authorized curated participants while keeping write/enrichment paths owner-only, and add a regression for an allowlisted non-owner reading kind: "markdown"/source on a private CG.
|
|
||
| const authorized = await this.authorizeSyncRequest(syncReq, fromPeerId); | ||
| if (!authorized) return denied(req); | ||
| if (!await canReadImportedArtifactByOwnerPolicy(this, req, syncReq)) return denied(req); |
There was a problem hiding this comment.
Blocking: the same access-policy issue exists on the P2P responder. authorizeSyncRequest() can return true for a private/curated CG when the requester is a valid participant/delegate, but canReadImportedArtifactByOwnerPolicy() then reduces the policy back to owner-or-public-open only. A joined allowlisted agent can sync the triples but still gets denied for the actual bytes. The artifact byte gate should preserve the signed selector-bound auth check and allow non-owner private CG participants that passed sync authorization, instead of requiring assertion ownership.
| } | ||
| } | ||
|
|
||
| const sourcePeerId = typeof raw.sourcePeerId === 'string' && raw.sourcePeerId.trim() |
There was a problem hiding this comment.
Blocking for end-to-end hydration: the route requires the caller to already know sourcePeerId, but when it is missing the response only returns source.agentAddress. A normal SWM reader can resolve the assertion owner and artifact hash from replicated triples, but not the source libp2p peer id, so this returns fetchable without enough information to make the next request. PR #1043 had durable/SWM publisher-peer plus agent-registry candidate resolution and retry; PR #1058 drops that discovery path. Please either resolve/retry candidate source peers server-side or surface persisted sourcePeerIds in artifact metadata so joined readers can actually open the MD file.
| }); | ||
| } | ||
|
|
||
| const statResult = await fileStore.stat(resolved.hash); |
There was a problem hiding this comment.
Blocking: this local hit is not scoped to the assertion or to a verified remote fetch. For owner self-reads that is fine, but for cross-agent/SWM fallback reads the file store is a global content-addressed cache: if this node happens to already have resolved.hash from another import/context, the route returns those bytes without proving they came from this artifact or from the source peer. PR #1043 avoided that by requiring a verified cache marker after full-hash remote hydration before reusing local bytes for owner-relaxed SWM reads. Please carry over that protection for non-owner/shared-memory artifact reads, otherwise this becomes a byte oracle for any linked hash already present on the node.
d1a4e88 to
bbba0fe
Compare
| if (!resolvedHash) { | ||
| throw new ImportArtifactRouteError(404, `No ${kind} artifact is linked to assertionUri`); | ||
| } | ||
| if (requestedHash && requestedHash !== resolvedHash) return 'hash_mismatch'; |
There was a problem hiding this comment.
🔴 Bug: validateContentHash() accepts bare 64-hex hashes, but this exact-string comparison only matches the prefixed form stored in metadata. A caller sending a valid bare hash will get a false hash_mismatch here (and the same pattern exists in the peer read path). Canonicalize both hashes before comparing, or tighten validation to reject bare hashes.
| } | ||
|
|
||
| return jsonResponse(res, 200, { | ||
| status: remote.verifiedBytes ? 'fetched' : 'fetchable', |
There was a problem hiding this comment.
🟡 Issue: remote.verifiedBytes is set even when cache === false, so this reports status: "fetched" for responses that were verified but never written to the local file store. That makes the status lie about local availability. Base the status on whether the cache promotion actually ran, or introduce a separate status for "verified-but-not-cached".
| assertionUri: resolved.assertionUri, | ||
| kind: resolved.kind, | ||
| hash: resolved.hash, | ||
| contentType: page.contentType ?? resolved.contentType, |
There was a problem hiding this comment.
🟡 Issue: page.contentType comes from the remote peer and is not covered by the hash verification you do above. Returning it preferentially lets a buggy or malicious peer change the MIME type the HTTP client sees even when local metadata says otherwise. Prefer the locally-resolved contentType, or reject mismatches before responding.
Summary
Implements a generic imported assertion artifact byte API that supports
source,markdown, andoriginalartifact kinds without introducing a separate authorization system.The artifact byte path now reuses the existing context graph sync authorization stack:
buildSyncRequestEnvelopeparseSyncRequestauthorizeSyncRequestArtifact-specific code only validates request shape, auth-envelope binding, metadata linkage, hash consistency, and bounded byte serving.
What Changed
POST /api/assertion/artifact/read/dkg/10.0.2/get-assertion-artifactauthPurposeauthSelectorFileStore.stat()andFileStore.readRange()sourceFileHashmdIntermediateHashtext/markdown_meta/api/assertion/import-artifact/read-markdownbehaviorAuthorization Model
No new source-blob-specific authorization flow was added.
The responder:
authorizeSyncRequest.Transport
Retains the useful transport ideas from PR #1043:
offsetmaxBytestotalBytesnextOffsettruncatedTests
pnpm --filter @origintrail-official/dkg-agent exec vitest run --config vitest.unit.config.ts test/imported-artifact.test.ts test/sync-envelope-cursor.test.tspnpm --filter @origintrail-official/dkg exec vitest run test/import-artifact-routes.test.tsNotes
sourcePeerIdis currently required for remote peer fetches; source-agent-to-peer discovery can be added as a follow-up.originalcurrently resolves to the linked source bytes unless distinct original-artifact metadata is added later.