bug: ghost trace in episode.trace_ids_json causes infinite rescore loop (590 calls, 14.5 RMB wasted)

## Summary

A ghost trace ID () was added to an episode's  but never existed in the  table. This caused  to return  indefinitely, triggering 590 reward rescore calls over 5 days and wasting ~14.5 RMB on LLM calls.

## Root Cause

### 1. Ghost trace creation

The episode was reopened with . During the follow-up merge,  was called which directly overwrites  without validating that trace IDs actually exist in the  table.

```javascript
// episodes.js line 80-82
appendTrace(id, traceIds) {
    appendTrace.run({ id, trace_ids_json: toJsonText(traceIds) });
    // No validation that traceIds exist in traces table!
}
```

### 2. Reward traceCount mismatch

 loads traces via  which only returns existing traces:

```javascript
// reward.js line 48-52
const traces = traceIds.length > 0
    ? deps.tracesRepo.getManyByIds(traceIds).sort(...)
    : [];
```

This returns 11 traces (ghost excluded). After scoring: `reward.traceCount = 11`.

### 3. Dirty check compares against episode.traceIds.length

```javascript
// memory-core.js line 1003-1006
const traceCount = reward.traceCount;
if (typeof traceCount === number) {
    return traceCount !== (ep.traceIds?.length ?? 0);
    // 11 !== 12 → true → dirty forever!
}
```

### 4. Infinite loop

 runs every 10 minutes, finds the episode dirty, rescores, gets 11 vs 12 mismatch, episode stays dirty. Repeat 590 times.

## Evidence

```
Episode: ep_95n61b3jzycd
trace_ids_json count: 12 (including ghost tr_xhbp6c9p450r)
tr_xhbp6c9p450r in traces table: 0 (does not exist)
reward.traceCount: 11
episodeRewardIsDirty: true (traceCount mismatch)
Reward calls: 590 over 5 days (06-18 to 06-22)
```

## Suggested Fixes

1. **appendTrace validation**: Validate trace IDs exist before appending
2. **Dirty check resilience**: Compare against actual existing trace count, not episode.traceIds.length
3. **Rescore retry limit**: Add max retry count per episode to prevent infinite loops
4. **Ghost trace cleanup**: Add startup scan to remove orphaned trace IDs from episodes

## Related Issues

- #1755 — large merged episodes can trigger L2/L3/skill-evolution storm (same infinite rescore pattern)

## Environment

- MemOS: v2.0.20
- Agent: hermes
- OS: Windows 11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: ghost trace in episode.trace_ids_json causes infinite rescore loop (590 calls, 14.5 RMB wasted) #1966

Summary

Root Cause

1. Ghost trace creation

2. Reward traceCount mismatch

3. Dirty check compares against episode.traceIds.length

4. Infinite loop

Evidence

Suggested Fixes

Related Issues

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

bug: ghost trace in episode.trace_ids_json causes infinite rescore loop (590 calls, 14.5 RMB wasted) #1966

Description

Summary

Root Cause

1. Ghost trace creation

2. Reward traceCount mismatch

3. Dirty check compares against episode.traceIds.length

4. Infinite loop

Evidence

Suggested Fixes

Related Issues

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions