Skip to content

feat: provide CMake config package for find_package(InfiniCCL)#40

Merged
Ziminli merged 1 commit into
InfiniTensor:masterfrom
GordonYang1:feat/support-find-package
Jun 26, 2026
Merged

feat: provide CMake config package for find_package(InfiniCCL)#40
Ziminli merged 1 commit into
InfiniTensor:masterfrom
GordonYang1:feat/support-find-package

Conversation

@GordonYang1

@GordonYang1 GordonYang1 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

InfiniCCL previously shipped no CMake config package, so downstream projects had to wire the integration by hand: set an INFINICCL_INSTALL environment variable, add ${INFINICCL_INSTALL}/include manually, link the absolute path to libinfiniccl.so, and hand-set INSTALL_RPATH.

This PR makes InfiniCCL installable as a first-class CMake package. After cmake --install, downstream projects consume it with find_package(InfiniCCL REQUIRED) and link the imported target InfiniCCL::infiniccl, which already carries the public include directory, the library location, and the runtime search path. All hardware/backend dependencies (CUDA, MPI, NCCL, ...) are linked PRIVATE into the shared library, so they are intentionally not propagated as usage requirements — consumers do not need to resolve them just to use InfiniCCL. The change is purely additive and non-breaking: the installed libinfiniccl.so and headers are unchanged; only export/packaging metadata is added, and the manual-linking path is retained in README.md as an alternative.

Changes

  • Packaging / export (CMakeLists.txt)
    • Add a project version (project(InfiniCCL VERSION 0.1.0 ...)) so a package version file can be generated.
    • Record the infiniccl target in an export set: install(TARGETS infiniccl EXPORT InfiniCCLTargets ...) with explicit LIBRARY / ARCHIVE / RUNTIME destinations.
    • Generate and install InfiniCCLConfig.cmake and InfiniCCLConfigVersion.cmake via CMakePackageConfigHelpers (configure_package_config_file + write_basic_package_version_file, COMPATIBILITY SameMajorVersion).
    • Install the export set as InfiniCCLTargets.cmake under ${CMAKE_INSTALL_LIBDIR}/cmake/InfiniCCL with namespace InfiniCCL::.
  • Config template (cmake/InfiniCCLConfig.cmake.in)
    • New minimal package config: @PACKAGE_INIT@, include the generated targets file, then check_required_components(InfiniCCL).
    • No find_dependency(...) calls — the backend libraries are PRIVATE to the shared library; the CMakeFindDependencyMacro include is kept as a guarded extension point for any future build that exposes such a dependency through the public interface.
  • Documentation (README.md)
    • Rewrite the "Run a Custom User Program" section to present find_package(InfiniCCL) as the recommended approach (with -DCMAKE_PREFIX_PATH / -DInfiniCCL_ROOT hints), keeping the existing manual-linking method as a documented alternative.

Platform and Backend Affected

This is a build-system / packaging change. It touches no device-specific code (src/<device>/) and no backend implementation (src/ompi/, ...), and it does not change the compiled libinfiniccl.so — only how it is exported and installed. No platform- or backend-specific behavior is affected, so no box below is checked.

Platform

  • CPU
  • NVIDIA GPU
  • Iluvatar GPU
  • MetaX GPU
  • Moore Threads GPU
  • Cambricon MLU

Backend

  • OpenMPI
  • MPICH

Performance Impact

  • No performance impact
  • Performance improved
  • Performance regression possible

The change only adds install/export metadata; no compiled code or runtime path is altered, and the floating-point reduction paths are byte-for-byte unchanged. For reference, the heterogeneous run (8 ranks, 4 MB per rank, Float32 + Sum) measured AllReduce at 9.979 ms (0.69 GB/s bus BW) and in-place Broadcast at 2.131 ms, matching the pre-change baseline.

Known Issues & Future Work

  • The package version is pinned to 0.1.0 as a starting point; there was no prior versioning scheme. It should be bumped as the public surface evolves.
  • The config is intentionally minimal and assumes the shared-library build: it emits no find_dependency(...) because CUDA / MPI / NCCL are linked PRIVATE. A future static-library build (or a public interface that exposes those dependencies) would need to add the corresponding find_dependency(...) calls in cmake/InfiniCCLConfig.cmake.in.
  • For an installed consumer binary, CMake does not automatically add the InfiniCCL library directory to its install RPATH. Build-tree consumers (the usual icclrun flow) get it automatically; an installed consumer should set CMAKE_INSTALL_RPATH_USE_LINK_PATH or its own RPATH.

Test Results

Validated on a NVIDIA–MetaX heterogeneous cluster over the OpenMPI backend via scripts/run_examples.py (run run_20260626_060706):

  • server: NVIDIA, 4 GPUs, ranks 0–3 (built with Devices [cpu, nvidia], Backends [ompi]).
  • test: MetaX, 4 GPUs, ranks 4–7 (built with Devices [cpu, metax], Backends [ompi]).
  • 8 ranks total; message size 1,048,576 float32 (4 MB) per rank; 2 warm-up + 20 profiled iterations.
  • All bundled example programs report Correct: YES (broadcast covers all 3 scenarios: out-of-place, in-place, legacy Bcast).

Because this is a build-system change that does not alter the compiled library or any collective's behavior, the full example regression above is the relevant signal: it confirms the rebuilt library and every collective still pass on real heterogeneous hardware. The new package itself was additionally verified by a downstream consumer: a clean-prefix cmake --install emits the four lib/cmake/InfiniCCL/*.cmake files, and a minimal program using only find_package(InfiniCCL REQUIRED) + target_link_libraries(app PRIVATE InfiniCCL::infiniccl) configures, builds (header resolved transitively, RPATH auto-set), and runs Correct: YES.

Test Involved Platform

  • CPU
  • NVIDIA GPU
  • Iluvatar GPU
  • MetaX GPU
  • Moore Threads GPU
  • Cambricon MLU

Test Involved Backend

  • OpenMPI
  • MPICH

all_gather.log
all_reduce.log
all_to_all.log
broadcast.log
gather.log
reduce.log
reduce_scatter.log
scatter.log
send_recv.log


Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat: …, fix(nccl): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from master — the branch is rebased cleanly on top of the current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — no unrelated modifications were introduced (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes (if any) are intentional, documented, and reflected in affected callers/tests.

General Code Hygiene

  • The code is self-explanatory; comments were added only where the intent or rationale is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, inconsistent indentation, or mixed formatting styles remain.
  • Identifiers referenced in comments or error messages are wrapped in Markdown backticks (e.g. the `AllReduce` implementation) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

  • N/A- Code follows the Google C++ Style Guide strictly.
  • N/A- clang-format (version 16, per .github/workflows/clang-format.yml) has been run against all modified applicable files; the diff is clean.
  • N/A- No exceptions are thrown. Error paths use assert with messages that include at least __FILE__, __LINE__, and __func__ (CONTRIBUTING.md §C++).
  • N/A- Error and warning message wording follows the LLVM Coding Standards (CONTRIBUTING.md §C++).
  • N/A- Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
  • N/A- Exactly one blank line between classes, between classes and functions, and between functions (CONTRIBUTING.md §C++).
  • N/A- Exactly one blank line between members (functions and variables) within a class (CONTRIBUTING.md §C++).
  • N/A- Exactly one blank line before and after the contents of a namespace (CONTRIBUTING.md §C++).

Python Specific (if Python files changed)

  • N/A- Code is PEP 8 compliant; ruff check passes cleanly on CI (see .github/workflows/ruff.yml).
  • N/A- ruff format --check passes cleanly — if not, run ruff format and commit the result.
  • N/A- Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
  • N/A- Framework-specific conventions (e.g. lowercase pytest.skip messages without terminal period) are honored where applicable (CONTRIBUTING.md §Python).
  • N/A- No blank line between the function signature and the body when there is no docstring or comment (CONTRIBUTING.md §Python).
  • N/A- A blank line is present before and after if, for, and similar control-flow statements (CONTRIBUTING.md §Python).
  • N/A- A blank line appears before each return, except when it directly follows a control-flow statement (CONTRIBUTING.md §Python).
  • N/A- Docstrings (if any) follow PEP 257 (CONTRIBUTING.md §Python).
  • N/A- Type hints are added / kept consistent with the surrounding code.

Testing

  • All applicable example programs have been built and tested successfully on at least one supported heterogeneous cluster setup.

Build, CI, and Tooling

  • N/A- New backends or devices have been added to auto-detection in CMakeLists.txt under if(AUTO_DETECT_DEVICES) or to if(AUTO_DETECT_BACKENDS) if applicable.
  • Both CI workflows (clang-format.yml, ruff.yml) are green locally (or expected to be green on CI).

Documentation

  • README.md, CONTRIBUTING.md, or inline docs updated when behavior, build flags, or developer workflow changed.
  • N/A- Any user-visible breaking change is called out explicitly under "Summary" and in the commit/PR title with a ! or BREAKING CHANGE: footer.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • N/A- Third-party code is license-compatible and attributed.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@GordonYang1 GordonYang1 requested a review from Ziminli June 24, 2026 02:50
Comment thread CMakeLists.txt Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
@Ziminli

Ziminli commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

PR 本身的一些小问题:Platform and Backend Affected 部分里的两个 MPI 后端不应该打勾,跟描述不符。

@GordonYang1 GordonYang1 force-pushed the feat/support-find-package branch from 7edf01a to 8d3fa0e Compare June 26, 2026 01:45
@GordonYang1 GordonYang1 requested a review from Ziminli June 26, 2026 01:46
@GordonYang1 GordonYang1 force-pushed the feat/support-find-package branch from 8d3fa0e to f50f32e Compare June 26, 2026 05:41
@Ziminli Ziminli merged commit c86bad8 into InfiniTensor:master Jun 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants