Skip to content

Residuals from abandoned #124: watcher eager high-water advance + standalone flake-repro harness #185

Description

@rbuergi

PR #124 chased the CodeEditRecompile CI flake. The flake population was since root-caused and eliminated by #173 (single compile driver via EnsureCompileDispatched + complete terminal write incl. CompiledFrameworkVersion). #124's 144-line watcher refinement was authored against the pre-#173 architecture and is unproven to move anything, so the PR is being closed rather than rebased. Three residuals worth tracking:

1. dispatchHighWater advances before the Update commit (potential lost trigger)

On current main, InstallReleaseRequestWatcher (src/MeshWeaver.Graph/Configuration/NodeTypeCompilationHelpers.cs:654) advances the process-local high-water in the Subscribe lambda before workspace.GetMeshNodeStream().Update(...) runs. The Update lambda has two bail paths (triggerAt <= handled, status flipped to Pending/Compiling) that return curr without stamping LastReleaseRequestHandledAt — yet the high-water is already past the trigger, so the settled re-emission fails req > dispatchHighWater and the trigger can be lost. The in-code comment claims the in-flight compile carries the request to a terminal status; verify that holds under #173's EnsureCompileDispatched semantics, and if not, advance the high-water only on the Update commit path (the #124 refinement, re-derived against the current code). Per repo standards: pin with a deterministic repro before changing the watcher.

2. Reusable 2-vCPU flake-repro harness

.github/workflows/flake-repro.yml on branch ci/flake-repro-workflow is a standalone workflow_dispatch job that loops one suspect test on the real ubuntu-latest 2-vCPU runner until it fails (uploads failing trx + traces + blame hang-dump). It reliably reproduced the CodeEditRecompile flake (iter 4/19/80) and targets any project/test. Zero src/ risk — worth landing on its own PR. Heisenbug note baked into the file: raising CompileWatcher log categories to Debug masks timing flakes.

3. Stuck-state timeout diagnostic

CodeEditRecompileTest.WaitForLatestRelease on the branch dumps Status / LatestReleasePath / RequestedReleaseAt / LastReleaseRequestHandledAt on the 50s timeout (timing-neutral, fires only on timeout). Cheap observability for any future recurrence.

Refs: #124 (closed), #173 (merged fix).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions