This page documents the current HTTP surface for Pallium.
The examples below assume the current agent_conversation_memory package,
which is the main product focus today.
The API has three main operations:
- send selected evidence with
POST /items - ask for continuity context with
POST /queryorPOST /query/debug - do both in one call with
POST /item-and-queryorPOST /item-and-query/debug
There are also two operational endpoints:
- inspect processing for one item with
GET /items/{source_item_id}/processing - inspect queue and background-worker state with
GET /debug/queue/health
Use this endpoint to store source items. Always accepts an array, always returns an array. Maximum 50 items per request.
[
{
"source_type": "chat_message",
"source_id": "msg-001",
"content_type": "text/plain",
"content": "We decided to use event timestamps for ordering."
}
]For multiple items in one call (e.g. assistant reply + tool summary + todo snapshot after an agent turn):
[
{ "source_type": "...", "source_id": "reply-1", "artifact_kind": "assistant_output", ... },
{ "source_type": "...", "source_id": "tools-1", "artifact_kind": "tool_use_summary", ... },
{ "source_type": "...", "source_id": "todo-1", "artifact_kind": "todo_snapshot", ... }
]Required fields (per item):
source_type— name of the upstream system (e.g."chat_message","ticket_update")source_id— stable unique ID from the upstream system, used for idempotencycontent_type— format of the content (use"text/plain"unless you have a specific reason not to)content— the text to store and reason over
Recommended fields for agent_conversation_memory:
container_ref— which container this item belongs to (e.g. a channel ID or room ID). Used for scoping, thread grouping, and visibility enforcementvisibility— who can see this item:"public","container","private", or"global". Default:"private". See Common Shapesthread_ref— which conversation thread within the containerwork_refs— optional list of external work identifiers for cross-thread work continuity (e.g. ticket IDs, PR numbers)role— who produced this:"user"or"assistant"artifact_kind— optional hint about the evidence shape (see below)
Additional context fields:
actor_ref— who said it (the human user, e.g. a user ID)agent_ref— which agent instance produced it (e.g. an agent deployment ID)source_ref— a link or pointer back to the original sourceoccurred_at— when the upstream event happened (ISO 8601)metadata— arbitrary key-value pairs for your own use
Minimal example:
[{
"source_type": "chat_message",
"source_id": "msg-001",
"content_type": "text/plain",
"content": "We decided to use event timestamps for ordering."
}]Recommended example for agent_conversation_memory:
[{
"source_type": "chat_message",
"source_id": "msg-001",
"content_type": "text/plain",
"content": "We decided to use event timestamps for ordering.",
"artifact_kind": "assistant_output",
"role": "assistant",
"container_ref": "channel:C04ABC123",
"visibility": "container",
"thread_ref": "thread:1700000001"
}]Response — always an array:
source_item_id— internal ID assigned by Palliummemory_object_ids— IDs of any memory objects promoted immediatelyrelation_ids— IDs of evidence relations createdindex_entry_ids— IDs of retrieval index entries createdprocessing_status—"pending","processing","completed","skipped", or"failed"processing_attempts— number of processing attempts so farprocessing_error— error message if processing failed (null otherwise)
Note: most items return with processing_status: "pending" because semantic
extraction runs asynchronously in the background. Use
GET /items/{source_item_id}/processing to inspect the result.
Notes:
- keep
source_idstable if you want upstream idempotency - for the current conversation package, always send
container_ref artifact_kindhelps Pallium route faster but is not required. Accepted values:message— a user question or statementassistant_output— a final assistant answer or decisiontool_use_summary— a compact summary of a tool runtodo_snapshot— an explicit next-step or progress notenotification— an external notification or alertnote— explicit "remember this" from the user. Uses a dedicated extraction prompt that preserves content verbatim and extracts only a title for retrieval. Use this when the user explicitly asks to save something.
Use this endpoint when you want to inspect what happened to one ingested item.
The response includes:
- processing state and attempts
- any processing error and failure category
- produced memory, relation, and index ids
- produced memory types
- whether thread rebuild was requested and completed
- compact provenance for produced memory
This is useful when ingest succeeds but the follow-up query does not return what you expected.
Use this endpoint when the runtime needs continuity context before answering.
Required fields:
text— the current user question or prompt
Recommended fields:
container_ref— scope the query to this containervisibility— visibility boundary for the querythread_ref— current thread within the container
Additional filters:
limit— max results (default: 5, range: 1–50)source_type— filter by upstream systemrole— filter by"user"or"assistant"artifact_kind— filter by evidence shapework_refs— optional list of external work identifiers. When provided, memories with matching work_refs are prioritized. Format: normalized strings (casefold, separator-canonical). "PROJ-123", "PROJ 123", "proj_123" all match.actor_ref— filter by actor identity. When provided, only returns memories whoseactor_refmatches or is null (shared). When omitted, no actor filtering is applied. See privacy-and-visibility.md for details. Minimal example:
{
"text": "Why did we choose event timestamps?"
}Recommended example:
{
"text": "Why did we choose event timestamps?",
"container_ref": "channel:C04ABC123",
"visibility": "container",
"thread_ref": "thread:1700000001"
}Current request rules:
limitdefaults to5limitmust be between1and50- for the current scoped package, missing
container_refcauses fail-closed behavior rather than a broad fallback
Response:
{
"should_inject": true,
"decision_reason": "carry_forward_available",
"injectable_blocks": [
{
"result_id": "mem-abc123",
"block_type": "memory_hit",
"title": "decision",
"text": "Use event timestamps for ordering — avoids timezone drift.",
"memory_type": "decision",
"memory_object_id": "mo-001",
"evidence": [
{
"source_item_id": "si-001",
"source_type": "chat_message",
"source_id": "msg-001",
"role": "assistant"
}
]
}
],
"results": [
{
"result_id": "mem-abc123",
"result_kind": "memory_hit",
"score": 850,
"type": "decision",
"memory_object_id": "mo-001",
"excerpt": null,
"container_ref": "channel:C04ABC123",
"thread_ref": "thread:1700000001",
"visibility": "container",
"retrieval_source": "lexical",
"evidence": [
{
"source_item_id": "si-001",
"source_type": "chat_message",
"source_id": "msg-001",
"role": "assistant",
"container_ref": "channel:C04ABC123",
"visibility": "container"
}
]
}
]
}Response fields:
should_inject— whether Pallium recommends injecting memory into the agent's promptdecision_reason— why injection was approved or declined:"carry_forward_available"— relevant memory found and approved for injection"constraint_supplement"— a user-stated constraint was found and injected"same_thread_context_sufficient"— the agent already has this context in the current thread"no_relevant_memory"— retrieval ran but nothing matched well enough"only_low_value_candidates"— matches found but too low-value to inject"low_injection_confidence"— candidates exist but confidence is below the injection threshold"no_candidates_above_floor"— all candidates scored below the minimum retrieval floor"low_value_query"— the query is a greeting, acknowledgement, or meta-conversation that won't benefit from memory"lane_ambiguity"— the query didn't clearly map to a retrieval strategy; Pallium chose silence over a guess"no_lane_eligible"— no structural retrieval lane (work resumption, evidence trace, residual recall) matched the query shape
injectable_blocks— ready-to-use blocks for prompt injection, each withblock_type,title,text, optionalmemory_type, optionalmemory_object_id, andevidencerefs.memory_object_idcan be used withGET /memory/{id}/expandto retrieve the structured payload and source conversation items that the memory was derived from. Injection blocks also carry anexpand_availableflag indicating whether the expand endpoint will return useful contentresults— ranked result list (see below)
Each result in results[]:
result_id— unique result identifierresult_kind—"memory_hit"(derived memory) or"source_hit"(stored evidence)score— retrieval score (integer, higher is better)type— memory type formemory_hitresults:"decision","investigation_outcome","thread_summary","task_checkpoint","atomic_fact","fact_summary","constraint_memory","pattern_memory","continuity_memory", or"note"memory_object_id— ID of the memory object (formemory_hit)source_item_id— ID of the source item (forsource_hit)excerpt— text excerpt (forsource_hit)container_ref,thread_ref,visibility— context refsretrieval_source—"lexical","vector", or"fused"(when hybrid retrieval is enabled)evidence— supporting evidence refs (formemory_hitresults)
This endpoint has the same request shape as POST /query and returns the same
normal result fields.
It also returns trace, which currently includes:
query_textquery_tokenslimit- optional
filters - retrieval
stages - package routing information under
routing - visibility information under
visibility - a compact
result_summary
Use this endpoint when you need to understand:
- why a result is missing
- why memory beat source evidence or vice versa
- which candidates were excluded by visibility rules
- what lexical matches were considered
Combines item ingest and memory query in a single call. This is the recommended endpoint for the common pattern: store the user message, then immediately query for relevant prior memory.
The request body is the same as POST /items, plus optional query fields:
query_text— override query text (defaults tocontent)query_limit— max results (default: 5, range: 1–50)query_actor_ref— actor identity for query-time visibility filtering. When provided, enables retrieval ofglobalmemories belonging to this actor. Defaults to the item'sactor_refif not set explicitly.work_refs— optional list of external work identifiers (same behavior as inPOST /query)
Example:
{
"source_type": "chat_message",
"source_id": "slack:C04ABC123:1700000001.000100",
"content_type": "text/plain",
"content": "Why did we choose event timestamps for ordering?",
"role": "user",
"artifact_kind": "message",
"container_ref": "slack:channel:C04ABC123",
"thread_ref": "slack:thread:C04ABC123:1700000001.000100",
"visibility": "container",
"actor_ref": "slack:user:U01XYZ789"
}Response — same as POST /query plus source_item_id:
{
"source_item_id": "si-abc123",
"should_inject": true,
"decision_reason": "carry_forward_available",
"injectable_blocks": [ ... ],
"results": [ ... ]
}The ingest runs first (async processing — the just-ingested message won't
appear in query results). The query then retrieves previously derived memory
relevant to the content (or query_text if provided).
POST /item-and-query/debug returns the same plus trace (same as
POST /query/debug).
Use this endpoint to retrieve the structured payload fields and source conversation items that a memory object was derived from. This enables drill-down from a compact memory card to the complete structured data and full original context.
An injection block sets expand_available: true when the memory has payload fields
worth expanding (evidence text, key findings, conclusions, etc.).
Required query parameter:
container_ref— the container to validate access against. The endpoint returns 404 if the memory object doesn't belong to this container.
Example:
GET /memory/63119911-e39d-4b51-b804-8cbecd0522b3/expand?container_ref=slack:channel:C123
Response:
{
"memory_object_id": "63119911-e39d-4b51-b804-8cbecd0522b3",
"payload": {
"decision": "Use event timestamps for ordering",
"rationale_text": "Avoids timezone drift issues seen in prior incident",
"decision_evidence_text": "We decided to use event timestamps..."
},
"items": [
{
"source_item_id": "si-001",
"source_type": "chat_message",
"source_id": "slack-message:C123:1700000001.000100",
"content": "We decided to use event timestamps for ordering because of timezone drift.",
"role": "user",
"actor_ref": "slack:user:U01XYZ789",
"occurred_at": "2026-04-09T10:30:00Z",
"thread_ref": "slack:thread:C123:1700000001"
}
]
}Security:
- The memory object must belong to the requested
container_ref, or havevisibility: "global"with a matchingactor_ref(404 otherwise) - Each evidence item is filtered through visibility rules — private items from other containers are excluded
- Returns 404 (not 403) for both missing and unauthorized access to avoid confirming ID existence
Flag a memory as incorrect, outdated, or low quality. After enough independent flags from different sources, the memory is suppressed and excluded from retrieval.
Request body:
{
"reason": "Outdated: PR was merged hours ago",
"source_ref": "agent-session:f890d298",
"immediate": false
}| Field | Type | Required | Default | Description |
|---|---|---|---|---|
reason |
string | yes | — | Why the memory is bad |
source_ref |
string | yes | — | Identifies who flagged it (used for dedup) |
immediate |
bool | no | false |
When true, suppress immediately without waiting for threshold |
Response (200):
{
"memory_object_id": "a8efd630-2a64-497f-a80d-c238825981d3",
"flag_count": 2,
"unique_sources": 2,
"suppressed": true
}| Field | Type | Description |
|---|---|---|
flag_count |
int | Total flags on this memory (all time) |
unique_sources |
int | Distinct source_ref values within the 30-day window |
suppressed |
bool | Whether the memory is now suppressed |
Suppression rules:
- Threshold mode (default): 2 independent sources (distinct
source_refvalues) within 30 days triggers suppression. Multiple flags from the same source count as one voice. - Immediate mode (
immediate: true): suppress without threshold. Use for confirmed-bad memories from human review. - Flagging an already-suppressed or superseded memory is accepted and recorded for audit but doesn't change lifecycle.
- The endpoint is idempotent — repeated calls return current state.
Error responses:
| Status | Condition |
|---|---|
| 404 | Unknown memory_object_id |
| 422 | Missing required fields |
This is the operational endpoint for the background pipeline.
The response includes:
- status counts for queued items
- oldest pending age
- pending items without a use case
- unclaimable pending reasons
- leased source items
- leased thread scopes
- recent failures
- retention-run state
This endpoint is mainly for local debugging, worker troubleshooting, and test or benchmark setup checks.
Operational readiness check. Returns whether the service is ready to handle requests.
{
"status": "ok",
"vector_index_ready": true
}status—"ok"when ready,"initializing"during startupvector_index_ready— whether the vector index reconciliation has completed
Returns HTTP 200 when ready, 503 when still initializing. Use this for container health probes and startup checks.
A simple string field:
"public"— visible to queries from any container"container"— visible only within the samecontainer_ref(group context)"private"— visible only within the samecontainer_ref(personal context)"global"— visible to queries from any container where the query'sactor_refmatches the item'sactor_ref(actor-scoped cross-container memory)
Default: "private".
- the semantic package is selected by the server-side
default_use_caseconfiguration; callers do not normally need to senduse_case agent_conversation_memoryis the main package described by the current docs- keep source content compact and explicit; the current semantic layer is text-oriented
- use
GET /items/{source_item_id}/processingandPOST /query/debugbefore changing prompts or retrieval heuristics blindly
- integration flow: agent-integration.md
- privacy rules: privacy-and-visibility.md