Skip to content

Preload trampoline binaries to avoid race conditions when lazily downloading during parallel builds#7438

Closed
dnagoda wants to merge 1 commit intomainfrom
04-30-preload_trampoline_binaries_to_avoid_race_conditions_when_lazily_downloading_during_parallel_builds
Closed

Preload trampoline binaries to avoid race conditions when lazily downloading during parallel builds#7438
dnagoda wants to merge 1 commit intomainfrom
04-30-preload_trampoline_binaries_to_avoid_race_conditions_when_lazily_downloading_during_parallel_builds

Conversation

@dnagoda
Copy link
Copy Markdown

@dnagoda dnagoda commented Apr 30, 2026

WHY are these changes introduced?

shopify app build fails intermittently on CI with ETXTBSY ("text file busy") when building apps with multiple Shopify Function extensions:

spawn ETXTBSY
Command failed with ETXTBSY: .../node_modules/@shopify/cli/bin/shopify-function-trampoline-1.0.2 -i <wasm> -o <wasm>

The trampoline binaries (shopify-function-trampoline-1.0.2, shopify-function-trampoline-2.0.1) are not shipped with the npm package — they are lazily downloaded from GitHub on first use by downloadBinary(). When renderConcurrent runs parallel extension builds, multiple concurrent tasks race to download the same binary. One task opens the file for writing (moveFile with {overwrite: true}) while another tries to execute it, and the Linux kernel returns ETXTBSY.

The in-process dedup map (downloadsInProgress) prevents redundant downloads within a single event loop tick, but the race window between the download completing and the file being executed by a concurrent build is still open.

This is the same class of bug that was previously fixed for Javy binaries in #2877 — which introduced installJavy() to pre-download Javy + plugin binaries before parallel builds start. The trampoline was simply missed because it was added later.

WHAT is this pull request doing?

Adds installTrampolines(app) — a new pre-download function that mirrors the existing installJavy(app) pattern. It eagerly downloads both trampoline versions (v1 and v2) before the parallel build fan-out, so that when individual runTrampoline() calls run concurrently, the binaries are already on disk and downloadBinary() is a no-op.

Key design choice: Unlike Javy (where the version depends on each extension's @shopify/shopify_function package version), the trampoline version is determined post-build from WASM imports (shopify_function_v1 vs shopify_function_v2). Since there are only two possible versions and the binaries are small, we simply pre-download both for any app with function extensions.

Changes:

  • packages/app/src/cli/services/function/build.ts — Added installTrampolines(app): skips if no function extensions, otherwise downloads both V1 and V2 trampoline binaries in parallel
  • packages/app/src/cli/services/build.ts — Calls installTrampolines alongside installJavy (via Promise.all) before renderConcurrent
  • packages/app/src/cli/services/deploy/bundle.ts — Same pre-download before parallel deploy builds
  • packages/app/src/cli/services/dev/processes/draftable-extension.ts — Same pre-download before dev draft updates
  • packages/app/src/cli/services/function/build.test.ts — Added 2 tests: verifies both versions are downloaded when function extensions exist, verifies no downloads when they don't

How to test your changes?

  1. Unit tests: pnpm vitest run packages/app/src/cli/services/function/build.test.ts (28 tests pass including 2 new)
  2. Manual repro (requires Linux CI or Docker):
  3. Regression check: existing Javy pre-download behavior unchanged — installJavy and installTrampolines run in parallel via Promise.all

Checklist

  • I've considered possible cross-platform impacts (Mac, Linux, Windows)
  • I've considered possible documentation changes
  • I've considered analytics changes to measure impact
  • The change is user-facing — I've identified the correct bump type (patch for bug fixes · minor for new features · major for breaking changes) and added a changeset with pnpm changeset add

Copy link
Copy Markdown
Author

dnagoda commented Apr 30, 2026

@github-actions github-actions Bot added the no-changelog This PR doesn't include a changeset entry. Is an internal only change not relevant to end users. label Apr 30, 2026
@dnagoda dnagoda force-pushed the 04-30-preload_trampoline_binaries_to_avoid_race_conditions_when_lazily_downloading_during_parallel_builds branch from b8be9e3 to af215f6 Compare April 30, 2026 22:51
@github-actions github-actions Bot added Area: @shopify/app @shopify/app package issues and removed no-changelog This PR doesn't include a changeset entry. Is an internal only change not relevant to end users. labels Apr 30, 2026
@dnagoda dnagoda force-pushed the 04-30-preload_trampoline_binaries_to_avoid_race_conditions_when_lazily_downloading_during_parallel_builds branch from af215f6 to 7ec0632 Compare April 30, 2026 23:11
@dnagoda dnagoda marked this pull request as ready for review April 30, 2026 23:40
@dnagoda dnagoda requested review from a team as code owners April 30, 2026 23:40
// as it might be done multiple times in parallel.
// javy -> https://github.com/Shopify/cli/issues/2877
// trampoline -> ETXTBSY race conditions during parallel Rust builds
await Promise.all([installJavy(app), installTrampolines(app)])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested dev with this change? I wonder if the dev-side preload is attached to the wrong lifecycle point, meaning this likely still misses app dev startup in some cases.

Right now the new preload runs in pushUpdatesForDraftableExtensions() in packages/app/src/cli/services/dev/processes/draftable-extension.ts, but the actual initial concurrent extension builds happen in AppEventWatcher.start() via buildExtensions(...).

Two consequences:

  • setupDevProcesses() uses setupDevSessionProcess(...) instead of setupDraftableExtensionsProcess(...) when developerPlatformClient.supportsDevSessions is true, so that path appears to get no trampoline preload at all.
  • More generally, the watcher startup is the place that owns the initial build fan-out, so putting the preload in a side process feels racy / easy to bypass.

Would it be safer to move this into the shared watcher startup path (e.g. before AppEventWatcher.start() begins buildExtensions(...)), so all app dev modes get the same protection?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very low context on CLI architecture, but I can definitely look into how we could do this at a higher level in the flow. Would we want to move the javy preload at the same time?

Copy link
Copy Markdown
Contributor

@dmerand dmerand May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also low-context on the functions side of things. Perhaps it works for Javy at this layer, it should work for Trampoline. Hopefully somebody with more Functions experience can weigh in.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to move both preloads out of the watcher. There's probably also an argument that we don't need the preload for dev contexts as these are likely to run in long lived environments. The issue with the build and bundle flows is that they would often be run in CI on fresh env where the resource has to be downloaded fresh.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to SetupDevProcesses.

await Promise.all(downloadPromises)
}

export async function installTrampolines(app: AppInterface) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One compatibility concern here that my 'bot found: installTrampolines() now eagerly downloads both trampoline versions, but trampolineBinary(version) only marks Windows-on-ARM support for versions >= 2.0.1.

That means on win32/arm64, a function app that only needs the v2 trampoline may have worked before (because runTrampoline() lazily selected/downloaded just v2), but could now fail up front trying to predownload the unused v1 binary.

Would it be safer to skip pre-downloading versions that are unsupported on the current platform/arch, or otherwise avoid failing the whole command on an unused trampoline version?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. Let me see how we check for the needed version and look into adding that logic here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't determine which trampolines we need during the preload because we don't have WASMs to introspect, but we can protect from download errors and log. This will protect us from having issues during the preload. Thanks for catching this. 👍

@dnagoda dnagoda force-pushed the 04-30-preload_trampoline_binaries_to_avoid_race_conditions_when_lazily_downloading_during_parallel_builds branch from 7ec0632 to 0cbea2b Compare May 1, 2026 15:32
@dmerand dmerand requested review from a team, achitojha and jeffcharles and removed request for a team May 1, 2026 17:54
@jeffcharles jeffcharles mentioned this pull request May 1, 2026
4 tasks
@jeffcharles
Copy link
Copy Markdown
Contributor

I've opened #7443 as an alternative fix. The underlying problem IMO is downloadBinary returning too early. That is, when it sees the file on the file system, not when the file is ready to be executed. What I prefer with my fix is it'll cover all Functions binaries, we don't have to download more than we need to, and we don't have to swallow errors during the download process.

If we do opt to go with this PR, I would ask we don't swallow all errors that happen with downloading but much more selective. The error messages would be misleading if someone tries building a Function for the first time and GitHub is having an outage or if they're using the v1 trampoline on Windows for ARM.

@jeffcharles
Copy link
Copy Markdown
Contributor

Thinking about this a little more, if the Shopify CLI, or the Functions binaries, were to drop support for Windows 10, we might be able to use Prism on Windows 11 to run the Intel binaries on ARM machines. I haven't tested it though as I don't have access to a Windows 11 machine.

Copy link
Copy Markdown
Author

dnagoda commented May 1, 2026

@jeffcharles Thanks for taking a look. Yes, your approach is much better. I completely missed the downloadsInProgress behavior. Closing this PR is favor of your approach.

@dnagoda dnagoda closed this May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: @shopify/app @shopify/app package issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants