ch4/ipc/gpu: revise the IPC caching strategy by hzhou · Pull Request #7862 · pmodels/mpich

hzhou · 2026-06-30T22:47:24Z

Pull Request Description

Before this PR, we have:

ch4 sender side handle cache
ch4 receiver size map cache
sender side specialized cache inside src/mpl/src/gpu/mpl_gpu.ze.c

The sender-side handle and receiver-side mapping fundamentally need be synchronized. With CUDA, new mapping will fail with stale overlapping addresses. And with ZE, stale caching entries on either side will prevent memory release and eventually lead to device memory exhaustion.

It is too much complexity to work with 3 separate caching facilities and manage their synchronization issues. In stead, in this new design, we only use a single sender-side cache and use explicit control messages to cache both the handle and remote mappings, thus it ensures consistency.

MPIR_CVAR_CH4_IPC_GPU_CACHE_SIZE prevents the cache hoarding device memories. Set MPIR_CVAR_CH4_IPC_GPU_CACHE_SIZE=0 effectively disables the caching. In principle, the cvar can be used at runtime to dynamically control the caching behavior.

This PR is partially based on the work by @nmnobre in #7821

[skip warnings]

Author Checklist

Provide Description
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form: module: short description
Commit message explains what's in the commit.
Passes All Tests
Whitespace checker. Warnings test. Additional tests via comments.
Contribution Agreement
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.

Remove the dead code.

Add notes and design plans.

It's difficult to maintain to have a separate caching system inside mpl_gpu_ze. Remove it and rely MPIR layer caching.

The extra attr parameter is used by mpl_gpu_ze's special cache. It is removed now.

We'll let sender side cache the mapped addresses and synchronize via active messages. Also remove MPIR_CVAR_CH4_IPC_GPU_HANDLE_CACHE. We'll always use ipc handle cache. To disable the cache, use MPIR_CVAR_CH4_IPC_GPU_MAX_CACHE_ENTRIES=0.

Move all code related to ipc cache together in gpu_post.c to facilitate refactor and maintenance.

Refactor the IPC GPU handle cache from uthash to a static array with LRU eviction (bounded by IPC_HANDLE_CACHE_MAX). Each cache entry now tracks remote mapped addresses, enabling a DIRECT IPC path that bypasses handle exchange and remote mapping on subsequent sends to the same rank. Key changes: - Replace uthash-based handle cache with a fixed-size array supporting LRU eviction and overlap detection for stale entries. - Track per-rank mapped addresses in each cache entry; use them to switch to DIRECT ipc type on cache hits. - Add MPIDI_IPC_send_mapaddr AM to notify senders of mapped addresses after receiver mapping, and MPIDI_IPC_send_unmap AM for cache eviction. - Move handle validation into ipc_track_cache_search so callers only see valid entries. - Split MPIDI_GPU_fill_ipc_handle into cached (p2p) and non-cached (win/coll) versions. - Simplify handle_status enum to a bool handle_is_cached.

Now that the old MPIR_CVAR_CH4_IPC_GPU_CACHE_SIZE is unused, rename MPIR_CVAR_CH4_IPC_GPU_MAX_CACHE_ENTRIES to MPIR_CVAR_CH4_IPC_GPU_CACHE_SIZE as the latter is more intuitive to recall. Set static IPC_HANDLE_CACHE_MAX to 1024 to allow more run time experiments.

MPIDIU_get_grank is used in active message paths and active messages don't really require communicator. Consider usages during init, finalize, and potentially sessions. The comm is used in the shm active message path only to lookup lpid via MPIDIU_get_grank. Make it work when we have the lpid already but not comm_world.

hzhou · 2026-07-01T02:12:45Z

test:mpich/ch4/ofi
test:mpich/ch4/gpu/ofi

When we clear the IPC handle cache at finalize, we send out unmap AM messages to notify remote processes to unmap. It is not an error if the remote processes already exit since the unmap is automatic at exit.

hzhou · 2026-07-01T22:48:23Z

test:mpich/ch4/ofi
test:mpich/ch4/gpu/ofi

hzhou added 6 commits June 30, 2026 17:35

ch4/ipc/gpu: remove unused MPIDI_GPU_ipc_fast_memcpy

649bdb6

Remove the dead code.

ch4/ipc/gpu: add notes on IPC GPU caching design

2def528

Add notes and design plans.

mpl/ze: remove specialized cache in mpl_gpu_ze.c

e3f9422

It's difficult to maintain to have a separate caching system inside mpl_gpu_ze. Remove it and rely MPIR layer caching.

mpl/gpu: remove attr param in MPL_gpu_ipc_handle_destroy

a9e0682

The extra attr parameter is used by mpl_gpu_ze's special cache. It is removed now.

ch4/ipc/gpu: remove mapped cache

1e35026

We'll let sender side cache the mapped addresses and synchronize via active messages. Also remove MPIR_CVAR_CH4_IPC_GPU_HANDLE_CACHE. We'll always use ipc handle cache. To disable the cache, use MPIR_CVAR_CH4_IPC_GPU_MAX_CACHE_ENTRIES=0.

ch4/ipc/gpu: gather the ipc cache code

0b73b21

Move all code related to ipc cache together in gpu_post.c to facilitate refactor and maintenance.

hzhou force-pushed the 2606_ipc_gpu branch from bcac270 to 231c9b8 Compare June 30, 2026 23:13

hzhou added 3 commits June 30, 2026 21:12

hzhou force-pushed the 2606_ipc_gpu branch from 231c9b8 to fec33e4 Compare July 1, 2026 02:12

ch4/ipc/gpu: ignore AM unmap error in finalize

e8c6f58

When we clear the IPC handle cache at finalize, we send out unmap AM messages to notify remote processes to unmap. It is not an error if the remote processes already exit since the unmap is automatic at exit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ch4/ipc/gpu: revise the IPC caching strategy#7862

ch4/ipc/gpu: revise the IPC caching strategy#7862
hzhou wants to merge 10 commits into
pmodels:mainfrom
hzhou:2606_ipc_gpu

hzhou commented Jun 30, 2026 •

edited

Loading

Uh oh!

hzhou commented Jul 1, 2026

Uh oh!

hzhou commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hzhou commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Author Checklist

Uh oh!

hzhou commented Jul 1, 2026

Uh oh!

hzhou commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hzhou commented Jun 30, 2026 •

edited

Loading