Skip to content

feat: introduce memory space detection and apply for MPI Broadcast#39

Merged
Ziminli merged 2 commits into
masterfrom
feat/host-device-pointer-check
Jun 22, 2026
Merged

feat: introduce memory space detection and apply for MPI Broadcast#39
Ziminli merged 2 commits into
masterfrom
feat/host-device-pointer-check

Conversation

@Ziminli

@Ziminli Ziminli commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR introduces a MemorySpace abstraction for identifying the memory properties of a given pointer and provides a platform-specific implementation for NVIDIA and MetaX. The new mechanism enables backends to distinguish between host and device memory at runtime and select the appropriate communication path.

As an initial application, the MPI implementation of Broadcast is updated to handle host and device buffers separately, allowing correct operation on both memory types.

Changes

  • Memory Space Abstraction

    • Add MemorySpace in src/device.h to represent the memory properties of a pointer;
    • Add the template interface GetMemorySpace() for platform-specific memory space detection;
    • Implement GetMemorySpace() for NVIDIA and MetaX in their corresponding "device_.h".
  • Error Checking Utilities

    • Add CheckCudaImpl() and the convenience macro INFINI_CHECK_CUDA for CUDA API error checking in src/nvidia/checks.h;
    • Add CheckMacaImpl() and the convenience macro INFINI_CHECK_MACA for MACA API error checking in src/metax/checks.h.
  • MPI Broadcast Enhancement

    • Update the MPI implementation of Broadcast to query the memory space of the communication buffer;
    • Handle host and device memory cases through separate execution paths.

Platform and Backend Affected

Platform

  • CPU
  • NVIDIA GPU
  • Iluvatar GPU
  • MetaX GPU
  • Moore Threads GPU
  • Cambricon MLU

Backend

  • OpenMPI
  • MPICH

Performance Impact

  • No performance impact
  • Performance improved
  • Performance regression possible

Performance impact is nuanced. In practice, no significant performance change is expected or observed.

For host-resident buffers, performance can improve substantially. Previously, users had to copy data to device memory before invoking Broadcast, after which the MPI implementation would internally transfer the data back to host memory. This introduced redundant data movement and unnecessary overhead. With memory-space awareness, host buffers can now be handled directly.

For device-resident buffers, the new logic introduces a small amount of additional processing to determine the memory space and select the appropriate execution path. However, this overhead is negligible compared to the communication cost and is not expected to have a measurable impact on performance.

Known Issues & Future Work

  • Device memory detection is currently implemented only for the NVIDIA and MetaX platforms;
  • Other collective operations should be migrated to use MemorySpace in future work;
  • Future enhancements may leverage device-aware MPI capabilities when available.

Test Results

Test Involved Platform

  • CPU
  • NVIDIA GPU
  • Iluvatar GPU
  • MetaX GPU
  • Moore Threads GPU
  • Cambricon MLU

Test Involved Backend

  • OpenMPI
  • MPICH

NV + MetaX:
all_gather.log
all_reduce.log
all_to_all.log
broadcast.log
gather.log
reduce.log
reduce_scatter.log
scatter.log
send_recv.log


Checklist

Every contributor must verify every item below before requesting
review. Tick each box only after the check has actually been performed —
do not tick speculatively. If an item truly does not apply, replace the
checkbox with N/A and briefly explain why in an inline comment.

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat: …, fix(nccl): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from master — the branch is rebased cleanly on top of the current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — no unrelated modifications were introduced (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • N/A- Public API changes (if any) are intentional, documented, and reflected in affected callers/tests.

General Code Hygiene

  • The code is self-explanatory; comments were added only where the intent or rationale is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, inconsistent indentation, or mixed formatting styles remain.
  • Identifiers referenced in comments or error messages are wrapped in Markdown backticks (e.g. the `AllReduce` implementation) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

  • Code follows the Google C++ Style Guide strictly.
  • clang-format (version 16, per .github/workflows/clang-format.yml) has been run against all modified applicable files; the diff is clean.
  • No exceptions are thrown. Error paths use assert with messages that include at least __FILE__, __LINE__, and __func__ (CONTRIBUTING.md §C++).
  • Error and warning message wording follows the LLVM Coding Standards (CONTRIBUTING.md §C++).
  • Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
  • Exactly one blank line between classes, between classes and functions, and between functions (CONTRIBUTING.md §C++).
  • Exactly one blank line between members (functions and variables) within a class (CONTRIBUTING.md §C++).
  • Exactly one blank line before and after the contents of a namespace (CONTRIBUTING.md §C++).

Python Specific (if Python files changed)

  • N/A- Code is PEP 8 compliant; ruff check passes cleanly on CI (see .github/workflows/ruff.yml).
  • N/A- ruff format --check passes cleanly — if not, run ruff format and commit the result.
  • N/A- Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
  • N/A- Framework-specific conventions (e.g. lowercase pytest.skip messages without terminal period) are honored where applicable (CONTRIBUTING.md §Python).
  • N/A- No blank line between the function signature and the body when there is no docstring or comment (CONTRIBUTING.md §Python).
  • N/A- A blank line is present before and after if, for, and similar control-flow statements (CONTRIBUTING.md §Python).
  • N/A- A blank line appears before each return, except when it directly follows a control-flow statement (CONTRIBUTING.md §Python).
  • N/A- Docstrings (if any) follow PEP 257 (CONTRIBUTING.md §Python).
  • N/A- Type hints are added / kept consistent with the surrounding code.

Testing

  • All applicable example programs have been built and tested successfully on at least one supported heterogeneous cluster setup.

Build, CI, and Tooling

  • N/A- New backends or devices have been added to auto-detection in CMakeLists.txt under if(AUTO_DETECT_DEVICES) or to if(AUTO_DETECT_BACKENDS) if applicable.
  • Both CI workflows (clang-format.yml, ruff.yml) are green locally (or expected to be green on CI).

Documentation

  • N/A- README.md, CONTRIBUTING.md, or inline docs updated when behavior, build flags, or developer workflow changed.
  • N/A- Any user-visible breaking change is called out explicitly under "Summary" and in the commit/PR title with a ! or BREAKING CHANGE: footer.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • N/A- Third-party code is license-compatible and attributed.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

Ziminli added 2 commits June 22, 2026 10:03
…y of the given pointer

 - add `MemorySpace` and template `GetMemorySpace` in `src/device.h`
 - add the NVIDIA implementation of `GetMemorySpace` in `nvidia/device_.h`
 - add NVIDIA check function `CheckCudaImpl()` and the convenience macro `INFINI_CHECK_CUDA` in `nvidia/checks.h`
 - apply `MemorySpace` to the MPI implementation of `Broadcast` to handle host and device cases, respectively
 - add MetaX's `GetMemorySpace` specialization in `src/metax/device_.h`
 - add MetaX's check function and macro in `src/metax/checks.h`
@Ziminli Ziminli self-assigned this Jun 22, 2026
@Ziminli Ziminli merged commit fbaaa32 into master Jun 22, 2026
2 checks passed
@Ziminli Ziminli deleted the feat/host-device-pointer-check branch June 22, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant