Skip to content

feat(core): self-hosted (non-BrowserStack) Maestro relay support#2248

Closed
Sriram567 wants to merge 4 commits into
masterfrom
feat/self-hosted-maestro-percy
Closed

feat(core): self-hosted (non-BrowserStack) Maestro relay support#2248
Sriram567 wants to merge 4 commits into
masterfrom
feat/self-hosted-maestro-percy

Conversation

@Sriram567
Copy link
Copy Markdown
Contributor

Summary

Adds non-BrowserStack ("self-hosted") support to the Maestro screenshot relay. Customers running percy app:exec -- maestro test ... on their own devices, CI, or device farms can now produce Percy builds without BS host injection.

Architecture, in one sentence: sessionId absent on /percy/maestro-screenshot is the self-hosted detection signal (it's a BrowserStack host-injected marker that lives nowhere else in the CLI); when absent, the relay resolves a file-find scope root from PERCY_MAESTRO_SCREENSHOT_DIR (process env, required, absolute existing directory — typically the customer's maestro test --test-output-dir <DIR> path), globs <root>/**/<name>.png, and applies the same realpath + trailing-separator prefix check that protects the BS path today. The BrowserStack path is byte-identical — the existing /tmp/{sessionId}{_test_suite} glob, manual-walker fallback, and security check all run exactly as before for any request with sessionId.

What changed

  • packages/core/src/api.jssessionId is optional; selfHosted = !sessionId derives the mode. New scope-root resolution block validates PERCY_MAESTRO_SCREENSHOT_DIR (absolute + existing dir, 400 with actionable message otherwise, emitted before realpath so it isn't masked as a 404). File-find branches on selfHosted for the glob pattern; the realpath/prefix check is now rooted at the resolved scopeRoot. filePath is rejected in self-hosted mode (the SDK never emits it; honoring it against a caller-influenceable root would re-open arbitrary in-root reads).

  • packages/core/test/api.test.js — new self-hosted (sessionId absent) describe locks the new branches: happy path (recursive glob find + payload omits sessionId + PNG-fill works), env-var validation (unset/relative/non-existent/file-not-dir → 400), filePath rejected, file-missing → 404, name-traversal rejected, coordinate region pass-through. The existing "rejects missing sessionId" test is updated to reflect the new behavior (missing sessionId + missing env var → 400 with the new actionable message).

Security invariants (relocated, not removed)

  • SAFE_ID on name (^[a-zA-Z0-9_-]+$) is load-bearing for the recursive glob — separators and traversal characters rejected.
  • Trailing-separator guard preserved on the prefix check (prevents sibling-prefix bypass like /x/.maestro vs /x/.maestro-secrets).
  • PERCY_MAESTRO_SCREENSHOT_DIR read from process.env only, never the request body — root cannot be moved per-call.
  • filePath rejected self-hosted.

What this PR does not include yet

This is the first of two cli commits for the feature; the iOS element-region resolver work (auto-discover port 7001 → BS-proven HTTP /viewHierarchy attach + lsof self-discovery fallback + graceful warn-skip, no cold-start tier) lands as a follow-up commit on this same branch. Marking the PR draft until it's in. SDK changes (@percy/maestro-app server default + session-id gate relaxation) and docs land in a companion PR against percy/percy-maestro — also pending.

Testing

  • Existing BS-path test suite (45+ specs in the /percy/maestro-screenshot describe) stays green — locks R7 (no BrowserStack regression).
  • 8 new self-hosted specs cover the new code branches.
  • Local: full suite couldn't run cleanly on Mac_Arm (the pinned chromium revision 1536376 returns 404 from chromium-browser-snapshots, breaking unrelated Server + browser tests). Buildkite CI runs in a clean environment.

Post-Deploy Monitoring & Validation

  • What to monitor: the BrowserStack Maestro+Percy success path on the live BS hosts after the next CLI bump in bs-nixpkgs.
  • Logs: Honeycomb — service.name=percy-api with build.client=maestro (host-side BS sessions) — snapshot_count should stay at baseline; "Snapshot command was not called" should not appear.
  • Healthy signal: existing BS Maestro builds continue uploading snapshots at pre-deploy rates; no new 400 from /percy/maestro-screenshot on BS sessions.
  • Failure signal / rollback: if BS Maestro builds start 400'ing or zero-snapshot rate climbs > 2σ over baseline in the 48h after deploy → revert the bs-nixpkgs derivation bump. The change is gated so the BS path is byte-identical — any regression points at this PR.
  • Validation window: 48h after the first bs-nixpkgs cli bump that includes this commit.
  • Owner: @Sriram567 (Percy SDK on-call backup).
  • Self-hosted side has no production runtime to monitor yet — opt-in, customer-driven (percy app:exec -- maestro test off-BS). Acceptance is the real-device gate in the plan (Android + iOS device builds with coordinate + element regions).

Plan reference

Plan: percy-maestro/docs/plans/2026-05-27-001-feat-self-hosted-maestro-percy-plan.md (cross-repo plan, not in this repo — Unit 1 of 4). Jira: PER-8599.


Compound Engineering v2.54.0
🤖 Generated with Claude Opus 4.7 (200K context, extended thinking) via Claude Code

Sriram567 added 2 commits May 28, 2026 12:57
…ESTRO_SCREENSHOT_DIR scope

`/percy/maestro-screenshot` now supports non-BrowserStack ("self-hosted")
Maestro runs by making `sessionId` optional. When the request omits it
(the BrowserStack host's exclusive marker), the relay enters self-hosted
mode and resolves the file-find scope root from a new required env var:

  - PERCY_MAESTRO_SCREENSHOT_DIR — process.env only, never request body.
    Must be an absolute, existing directory (typically the customer's
    `maestro test --test-output-dir <DIR>` path). Missing/invalid → 400
    with actionable guidance, emitted before realpath so it isn't masked
    as a 404.

Self-hosted file-find uses a recursive `<root>/**/<name>.png` glob and
goes through the same realpath + trailing-separator prefix check as the
BS path, just rooted at the env-supplied dir. The security boundary is
relocated, not removed:

  - filePath is rejected outright self-hosted (SDK never emits it; an
    absolute filePath against a caller-influenceable root would re-open
    arbitrary in-root reads).
  - SAFE_ID on `name` stays load-bearing for the recursive glob.
  - Trailing-separator guard preserved (prevents /x/.maestro vs
    /x/.maestro-secrets bypass at the relocated root).

BS path (sessionId present) is byte-identical — the existing
`/tmp/{sessionId}{_test_suite}` glob, walker fallback, and realpath
check all run exactly as before. R7 from the plan.

Tests: BS-path tests pass unchanged. New self-hosted describe locks the
new branches: happy path (recursive find, payload omits sessionId), env
var validation (unset/relative/non-existent/file-not-dir → 400),
filePath rejected, file missing → 404, name traversal rejected,
coordinate region pass-through.

Plan: docs/plans/2026-05-27-001-feat-self-hosted-maestro-percy-plan.md
(Unit 1 of 4; iOS port resolver + SDK + docs follow).
…o/` default

Maestro's default output directory is `.maestro/` (dot-prefixed). When a
self-hosted customer runs `maestro test` without `--test-output-dir`,
screenshots land under `<cwd>/.maestro/...`. fast-glob's default is
`dot: false`, so the recursive `<root>/**/<name>.png` glob silently
skipped dot-prefixed segments and 404'd the file.

CI surfaced this on two self-hosted tests (happy path + coordinate
regions pass-through), whose fixtures correctly mirror Maestro's
`.maestro/run-x/screenshots/` real-world layout.

`dot: true` is applied only on the self-hosted glob; the BS glob is
unchanged (BS layouts have no dot-prefixed segments — byte-identical R7).
…of + warn-skip

Adds non-BrowserStack support to the iOS element-region resolver. Before
this change, `dump({ platform: 'ios' })` required BOTH
`PERCY_IOS_DEVICE_UDID` AND `PERCY_IOS_DRIVER_HOST_PORT` to be set
(host-injected by realmobile) — if either was absent (the normal
self-hosted state) the resolver bailed with `unavailable/env-missing`
and element regions silently dropped.

The dispatch now splits into two branches by `PERCY_IOS_DRIVER_HOST_PORT`
presence:

EXPLICIT (port set) — BS and customer-supplied-port self-hosted both
take this path. Existing HTTP-primary → CLI-fallback cascade runs
unchanged when UDID is also set. New: when UDID is absent (a
self-hosted-with-explicit-port scenario), HTTP is the only path; on
HTTP failure the resolver warn-skips with the new `self-hosted-no-udid`
reason rather than running maestro CLI without a target. `parseIos
DriverHostPort` is relaxed from the BS-specific 11100-11110 range to
any valid TCP port (1-65535) — BS values remain a strict subset.

IMPLICIT (port unset) — self-hosted discovery cascade, all HTTP-based:
  1. Cache hit (iosPortCache, per-Percy-instance, mirroring grpcClientCache).
  2. Probe 127.0.0.1:7001 — source-verified deterministic single-
     simulator port on Maestro ≤2.4.0 (cli-2.4.0 TestCommand.kt#selectPort).
  3. `lsof -nP -iTCP -sTCP:LISTEN | grep maestro-driver-ios…xctrunner` —
     exactly-one-match guard (zero or multiple → warn-skip, no guess).
  4. Warn-skip with actionable log (`self-hosted-no-driver` reason).

"Probe" reuses runIosHttpDump as both liveness check AND dump — a
`hierarchy` or `no-aut-tree` result confirms the port is a valid
Maestro driver (the existing axElement-root schema check rejects wrong
services); `dump-error`/`connection-fail` advances to the next
candidate (no drift-bit flip in cascade — drift is reserved for the
explicit-port BS path).

No cold-start `maestro hierarchy` tier — cut per plan after the spike
verified 7001 is deterministic on current GA Maestro and lsof covers
the future ephemeral-port case (Maestro 2.6+ `ServerSocket(0)`).

BS path R7: the EXPLICIT branch with both UDID + valid PORT runs
byte-identical to today. Existing iOS-HTTP, schema-drift, kill-switch,
and healthcheck-drift tests pass unchanged. Three previously-expected
`env-missing` tests are updated to the new branching:
  - PORT set + UDID unset → enters EXPLICIT, HTTP fails → `self-hosted-no-udid`.
  - PORT unset (regardless of UDID) → enters IMPLICIT, cascade →
    `self-hosted-no-driver` on no resolution.

New describe `iOS self-hosted port cascade` (7 tests) covers:
probe-7001 hit + caching, lsof single-match discovery, lsof zero/multi-
match warn-skip, explicit out-of-legacy-range port (e.g., 6001 for real
device) bypassing the cascade, wrong-service response (no axElement)
falling through without caching, and cache-hit reuse on subsequent
snapshots.

Files:
  - packages/core/src/maestro-hierarchy.js: new constants
    (IOS_SELF_HOSTED_PROBE_PORT), three new helpers (execLsofDefault,
    lsofXctrunnerPort, resolveSelfHostedIosPort), iOS dispatch refactor,
    dump() signature extended with execLsof + iosPortCache injectables.
  - packages/core/src/api.js: thread `iosPortCache: percy.iosPortCache`
    through the maestroDump call (one line; mirrors grpcClientCache).
  - packages/core/src/percy.js: initialize `this.iosPortCache = { port: null }`
    next to grpcClientCache (per-Percy-instance scope, D9 of the maestro
    4-PR plan).
  - packages/core/test/unit/maestro-hierarchy.test.js: 3 existing tests
    updated for the new branching, 7 new cascade tests added.

Plan: docs/plans/2026-05-27-001-feat-self-hosted-maestro-percy-plan.md
(Unit 2 of 4; Unit 1 already in ef66ccc/8ac60b87; companion SDK + docs
in percy/percy-maestro-app#7).
@Sriram567 Sriram567 marked this pull request as ready for review May 28, 2026 09:05
@Sriram567 Sriram567 requested a review from a team as a code owner May 28, 2026 09:05
…hosted-no-driver

Caught by CI on PR #2248: the cross-platform parity test asserted the
old `env-missing` reason tag, which Unit 2 replaced with
`self-hosted-no-driver` (PERCY_IOS_DRIVER_HOST_PORT absent enters the
new self-hosted IMPLICIT cascade and warn-skips with the new reason
when no driver is found).

Test now injects fakes for httpRequest + execLsof so the cascade
deterministically exhausts and returns `self-hosted-no-driver`. The
envelope-shape parity invariant (kind = 'unavailable') is preserved.

Fix-up to commit ccb9bc0 (Unit 2).
Sriram567 added a commit to percy/percy-maestro-app that referenced this pull request May 28, 2026
…vice run

Marks status: validated-2026-05-28. The runbook stub anticipated this
backfill ("Backfill on first real run") — these are the verified
commands, build IDs, and the two gotchas discovered during the first
real-device acceptance run on a Pixel 10 / Android 16 / Maestro 2.4.0
host.

Validated:
- /percy/maestro-screenshot relay resolves PERCY_MAESTRO_SCREENSHOT_DIR-
  scoped screenshots with no sessionId (Unit 1).
- SDK uploads successfully self-hosted with PERCY_SESSION_ID omitted from
  the payload (Unit 3).
- Coordinate regions are forwarded SDK → relay → Percy comparison engine.
  Baseline build #50215463 (7×8=56) approved; comparison build #50215516
  (9×9=81) reported `02_CalcResult` as Unchanged — masked area's diff
  ignored, canonical regions-working demonstration.

Two gotchas captured for the next person validating self-hosted on a
fresh machine:

1. Maestro hangs silently on the driver-APK install step without
   JAVA_TOOL_OPTIONS="-Djava.net.preferIPv4Stack=true". The hang is in
   dadb trying to talk to adb over IPv6 loopback. Reproduces with both
   Maestro 2.6.0 and 2.4.0, with Play Protect disabled, on Pixel 10 +
   Android 16 + macOS 15.5 (Apple Silicon). Independent of Percy —
   `maestro test` alone hangs identically without the env var.

2. Maestro's GraalJS runScript context does NOT inherit the parent
   process's environment. PERCY_SERVER_ADDRESS exported by percy
   app:exec is invisible to the SDK. Customer must pass
   `-e PERCY_SERVER=http://localhost:5338` as a Maestro -e flag on every
   `maestro test` invocation. This is the one extra config item required
   for self-hosted vs. BrowserStack.

iOS validation pending (no iOS device at the time of this run); the
runbook captures the runtime-verify items for that next pass.

Plan: docs/plans/2026-05-27-001-feat-self-hosted-maestro-percy-plan.md
Companion CLI PR: percy/cli#2248
@Sriram567
Copy link
Copy Markdown
Contributor Author

Consolidated into #2261 (single self-hosted Maestro+Percy V1 PR — all 6 commits, same content). Closing this stack-base in favor of the unified review surface.

@Sriram567 Sriram567 closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant