feat(allocator): add optional rpmalloc abstraction layer and adapter#816
feat(allocator): add optional rpmalloc abstraction layer and adapter#816Vansh-kap-98 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
3 issues found across 11 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/allocator/allocator.cc">
<violation number="1" location="src/allocator/allocator.cc:19">
P2: rpmalloc_initialize() return value ignored, risking undefined behavior on init failure</violation>
</file>
<file name="benchmark/micro/allocator_profile.cc">
<violation number="1" location="benchmark/micro/allocator_profile.cc:154">
P2: Benchmark comment claims rpmalloc can be tested by setting an environment variable, but the code always uses a default Config with Backend::Standard and initialize() does not parse environment variables. This leads to invalid benchmark results.</violation>
</file>
<file name="src/allocator/include/sourcemeta/blaze/allocator_adapter.h">
<violation number="1" location="src/allocator/include/sourcemeta/blaze/allocator_adapter.h:40">
P2: RpmallocAdapter does not handle over-aligned types; `malloc`/`rpmalloc` only guarantee `max_align_t` alignment, violating the STL allocator contract for types with `alignof(T) > alignof(std::max_align_t)`</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic
| break; | ||
| case Backend::RPMalloc: | ||
| #ifdef BLAZE_ALLOCATOR_RPMALLOC | ||
| rpmalloc_initialize(); |
There was a problem hiding this comment.
P2: rpmalloc_initialize() return value ignored, risking undefined behavior on init failure
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/allocator/allocator.cc, line 19:
<comment>rpmalloc_initialize() return value ignored, risking undefined behavior on init failure</comment>
<file context>
@@ -0,0 +1,69 @@
+ break;
+ case Backend::RPMalloc:
+#ifdef BLAZE_ALLOCATOR_RPMALLOC
+ rpmalloc_initialize();
+#else
+ throw std::runtime_error(
</file context>
| RpmallocAdapter(const RpmallocAdapter<U>&) {} | ||
|
|
||
| /// @brief Allocate memory for n elements | ||
| [[nodiscard]] pointer allocate(size_type n) { |
There was a problem hiding this comment.
P2: RpmallocAdapter does not handle over-aligned types; malloc/rpmalloc only guarantee max_align_t alignment, violating the STL allocator contract for types with alignof(T) > alignof(std::max_align_t)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/allocator/include/sourcemeta/blaze/allocator_adapter.h, line 40:
<comment>RpmallocAdapter does not handle over-aligned types; `malloc`/`rpmalloc` only guarantee `max_align_t` alignment, violating the STL allocator contract for types with `alignof(T) > alignof(std::max_align_t)`</comment>
<file context>
@@ -0,0 +1,63 @@
+ RpmallocAdapter(const RpmallocAdapter<U>&) {}
+
+ /// @brief Allocate memory for n elements
+ [[nodiscard]] pointer allocate(size_type n) {
+#ifdef BLAZE_ALLOCATOR_RPMALLOC
+ return static_cast<pointer>(rpmalloc(n * sizeof(T)));
</file context>
0a45cf2 to
6e9f670
Compare
|
Hey @Vansh-kap-98 , interesting! To clarify, on those benchmark metrics you shared, What might be interesting as a step before, would be to try out many potential different allocators. If one seems to clearly win, we would incorporate that one in the build as a default even? |
Signed-off-by: Vansh <[email protected]>
6e9f670 to
15ada05
Compare
|
Hey @jviotti, im sorry for the rough wording earlier. For the PoC I enabled rpmalloc globally to validate behavior, which changed every allocation site at once and caused regressions, I hadn’t targeted specific hot containers. I’ll finish fixing the clang-format checks first, then run a short allocator matrix like mimalloc and try targeted PMR adoption on compiler hot-path containers. I’ll post results and progress updates under this draft, i do realise i still have got lots to figure out and learn. |
22daaba to
a8d0b13
Compare
|
Very much appreciated and looking forward to the results. Overall:
Let me know if I can help in any way. It is exciting research! |
5bfabe8 to
31fd9ca
Compare
Signed-off-by: Vansh <[email protected]>
31fd9ca to
2ba8e04
Compare
|
Thank you so much for the detailed insight, @jviotti! I really appreciate your guidance. I would love to use this opportunity to dive deeper and research various allocator behaviors and optimization patterns. As I am still learning and getting comfortable with these custom allocator integrations, my progress might be a bit slower than if I asked for direct help, but I am incredibly eager to figure it out independently. That said, please let me know if there are any deadlines or milestones I should keep in mind! I plan to work on both core and blaze simultaneously, as I definitely want to see this initial baseline draft through to a clean completion rather than leaving it unfinished. I will collect the benchmarking data and post the results individually across their respective repositories once they are ready. |
|
No hurries and looking forward to anything you find! |
Overview
Introduces an optional, opt-in rpmalloc wrapper for the Blaze allocator layer to handle high-concurrency memory tracking without modifying core validation logic. It remains completely disabled by default.
Implementation Details
Build System: Enable via -DBLAZE_ALLOCATOR_RPMALLOC=ON. Fetches and pins [email protected] automatically.
Codebase Changes: Added a clean abstraction layer under src/allocator with process and thread lifecycle hooks, alongside a header-only RpmallocAdapter for STL containers.
Safety: Fully guarded via preprocessor directives. Invoking the backend without compiling it first throws a clean std::runtime_error.
Phase 1 & Phase 2 Findings
Phase 1 (Measurement): Running baseline microbenchmarks on unmodified logic proved the concept, showing up to a 10x throughput improvement in isolated, highly targeted validation setups.
Phase 2 (Integration): The core plumbing is complete and reproducible. However, a broad process-wide proof-of-concept swap showed performance regressions across compile and validation metrics in quick Release runs.
Direct Benchmark Metrics (Baseline → rpmalloc PoC)
Compile Time: 1.45 ms → 2.85 ms (+96.3%)
Single-Threaded Validate: 287 ns → 480 ns (+67.3%) | Throughput dropped ~46.7%
Concurrent Validate (4 Threads): 87.4 ns → 160 ns (+83.2%)
Concurrent Validate (8 Threads): 58.8 ns → 78.2 ns (+33.0%)
suggestions
Next step: Targeted PMR (polymorphic memory resources) implementation on specific compiler hotspots to isolate performance gains.
First-Time Contributor Note
This is my first pull request on this repository, so I would love to get your constructive criticism, feedback on the styling, or any architectural suggestions you have. If there are specific compiler modules or high-churn data streams you think I should look at next for a targeted PMR implementation, let me know. I am really eager to explore the codebase further and see where else this allocator layer can add value.