[feat] Add Hexagon HMX backend support by Calaweh · Pull Request #2155 · tile-ai/tilelang

Calaweh · 2026-05-06T10:32:03Z

Title: Add Hexagon Backend with HMX Support for Matrix Multiplication

Summary

Introduce a dedicated Hexagon backend for TileLang with support for Qualcomm HMX (Hexagon Matrix eXtensions). This integration enables the generation of high-performance LLVM IR specifically targeting Hexagon DSP hardware instructions for matrix multiplication.

Why

Qualcomm's Hexagon DSPs power a vast number of edge devices. By leveraging the Hexagon Kernel Library (HexKL) interfaces through TileLang, we can achieve efficient CodeGen for specialized operators like sparse attention on mobile hardware. This keeps TileLang at the forefront of heterogeneous hardware support and enables localized, backend-specific optimizations for Qualcomm chips.

Key Changes

Hardware Lowering

New C++ pass LowerHexagonIntrinsics that translates TileLang MMA placeholders into the specialized @HexKL_mma_i8acc32 hardware instructions.

Memory Architecture

Native C++ registration of global.vtcm and global.hmx.acc memory scopes.
Provides the compiler with correct alignment (128‑byte) and capacity constraints for Hexagon HTP.

Runtime Support

Runtime power‑management wrapper hmx_kernel_launch utilizing the HexagonHtp RAII guard to enable the HMX hardware block during execution.

JIT Infrastructure

Enhancements to the kernel cache to support IR‑only backends.
Addition of execution guards to prevent cross‑compiled binaries from attempting to run on incompatible host CPUs.

Validation

# Style and Linting
pre-commit run --all-files

# Build Verification (requires LLVM 17/18)
cd build
cmake .. -DUSE_LLVM=ON
make -j$(nproc)

# Symbol Verification
nm -D lib/libtilelang.so | grep "LowerHexagonIntrinsics"

# Functional Testing (IR Generation & Logic)
pytest testing/python/hexagon/test_hmx_mma.py
pytest testing/python/hexagon/diagnose_hmx.py

Additional Checks

IR Validation: Verified correct target triple (hexagon), memory allocations (A_vtcm, C_acc), and hardware intrinsic calls (HexKL_mma_i8acc32).
Host Guard: Verified that calling Hexagon kernels on x86/ARM hosts raises a descriptive RuntimeError.

Compatibility

Requires LLVM 17 or 18.
Tested on Qualcomm HTP hardware (simulator and physical device).

Related Issues

Closes: #1293

Summary by CodeRabbit

New Features
- Hexagon DSP target support with HMX acceleration, intrinsics (MMA, DMA), architecture descriptors, and runtime kernel launch integration.
- Hexagon-aware JIT/adapters and target handling for cross-compilation workflows.
Tests
- New diagnostics and unit tests for Hexagon codegen, IR lowering, and HMX MMA behavior.
Chores
- Build-system enhancements for LLVM/Hexagon detection and IR-only caching; kernel-cache and target utilities updated for Hexagon.

github-actions · 2026-05-06T10:32:14Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-05-06T10:32:17Z

📝 Walkthrough

Walkthrough

This PR adds Hexagon HMX support: build/runtime wiring, intrinsics and lowering pass, architecture and target detection, JIT/libgen cross-compile guards, kernel cache adjustments, and tests/diagnostics for Hexagon codegen and lowering.

Changes

Hexagon Backend with HMX Support

Layer / File(s)	Summary
Build / Includes `CMakeLists.txt`, `src/runtime/hexagon_runtime.cc`	Adds LLVM/Hexagon detection and resolution, saves/restores `USE_LLVM` around TVM config, appends TVM include/src paths, and registers `tilelang.hexagon.hmx_kernel_launch` source.
Intrinsics & Lowering `tilelang/intrinsics/hexagon/__init__.py`, `src/transform/lower_hexagon_intrinsics.cc`, `tilelang/engine/phase.py`, `tilelang/engine/lower.py`	Adds HMX intrinsics API (mma, mma_fp16, vtcm_dma_copy), HMXBuilder, a mutator pass that rewrites `hmx_mma_placeholder` to HexKL intrinsics, and hooks to apply lowering during optimization.
Architecture & Target Utilities `tilelang/carver/arch/hexagon.py`, `tilelang/carver/arch/__init__.py`, `tilelang/utils/target.py`	Introduces `HexagonArch`, memory scopes, `HMXTileShape`/shapes, `get_hexagon_arch`, registers `hexagon` target, and adds `check_hexagon_availability()` and `is_hexagon_target()`.
JIT / Libgen / Adapter Integration `tilelang/jit/adapter/*`, `tilelang/jit/adapter/wrapper.py`, `tilelang/jit/kernel.py`, `tilelang/jit/adapter/utils.py`	Adds Hexagon-aware guards to skip host library loading/compilation for cross-compilation (IR-only), exposes `arch` on `TLWrapper`, relaxes device-module assertions for Hexagon, and improves kernel declaration matching for LLVM-style output.
Kernel Cache `tilelang/cache/kernel_cache.py`	Handles IR-only adapters by excluding the binary `.so` from required staging files and skipping binary cache writes when no libpath is present.
Tests & Diagnostics `testing/python/hexagon/diagnose_hmx.py`, `testing/python/hexagon/test_hmx_mma.py`	Adds `has_hexagon_codegen()` and `build_hmx_matmul()` helpers, diagnostic IR/lowering tests, compilation/lowering checks, and a host-execution guard test.

Sequence Diagram

sequenceDiagram
  participant User
  participant Compiler
  participant Lowering
  participant LowerHexagonIntrinsics
  participant JIT
  participant FFI
  participant HMX

  User->>Compiler: request compile for Hexagon target
  Compiler->>Compiler: detect is_hexagon_target()
  Compiler->>Lowering: lower() (Hexagon branch)
  Lowering->>LowerHexagonIntrinsics: apply lowering pass
  LowerHexagonIntrinsics-->>Lowering: return IR with HexKL intrinsics
  Lowering-->>JIT: emit IR / produce IR-only artifact
  User->>FFI: call tilelang.hexagon.hmx_kernel_launch
  FFI->>FFI: extract kernel Function and args
  FFI->>HMX: enter HexagonHtp (power on HMX)
  HMX-->>FFI: execute HMX MMA and return result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

LeiWang1999

Poem

🐰 I hopped to Hexagon's bright glen,
HMX tiles and VTCM then,
Intrinsics lowered, kernels spun,
Cross-compile paths now neatly done,
A carrot-powered kernel run!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[feat] Add Hexagon HMX backend support' clearly and concisely summarizes the main change: introducing Hexagon HMX backend support to TileLang.
Linked Issues check	✅ Passed	The PR successfully implements Hexagon backend support with HMX integration as requested in `#1293`, including memory architecture, hardware lowering, runtime support, and JIT enhancements.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with Hexagon HMX backend support: CMake configuration, intrinsic lowering, runtime functions, architecture definitions, JIT integration, caching, and testing are all in-scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CMakeLists.txt`:
- Around line 157-186: The build currently defines TILELANG_HEXAGON_ENABLED
whenever USE_LLVM is truthy; update the CMake logic (around the llvm-config
discovery that sets LLVM_CONFIG_PATH/USE_LLVM and the later block that sets
TILELANG_HEXAGON_ENABLED) to actually verify the LLVM toolchain has Hexagon:
after finding llvm-config (LLVM_CONFIG_PATH) run llvm-config --targets-built (or
invoke it via execute_process) and check the output contains "Hexagon"; only
then set the TILELANG_HEXAGON_ENABLED definition and the "TileLang Build with
LLVM" status message. If llvm-config is not found or it does not report Hexagon,
change the earlier non-fatal warning into a fatal error (or at minimum do not
enable TILELANG_HEXAGON_ENABLED) so hexagon_runtime.cc and related transforms
are not advertised/compiled incorrectly; refer to the LLVM_CONFIG_PATH/USE_LLVM
and TILELANG_HEXAGON_ENABLED symbols to locate where to add the execute_process
and conditional enablement.

In `@src/runtime/hexagon_runtime.cc`:
- Around line 16-23: Ensure you validate args.size() before accessing args[0] or
constructing ffi::PackedArgs to avoid UB when called with zero arguments: in the
lambda that reads args and creates kernel and kernel_args, first check that
args.size() >= 1, and if not set an appropriate runtime error/return (e.g.,
populate rv with an error or throw a tvm/ffi runtime error) instead of
proceeding; specifically modify the block that uses args[0], ffi::Function
kernel, and ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1) to
perform the size check and early error return.

In `@src/transform/lower_hexagon_intrinsics.cc`:
- Around line 28-35: The rewrite for the "hmx_mma_placeholder" intrinsic
currently indexes call->args[3] without checking arity; update the branch that
handles func_name->value == "hmx_mma_placeholder" in lower_hexagon_intrinsics.cc
to first validate call->args.size() >= 4 and only perform the Array<PrimExpr>
construction and return Evaluate(...) when that check passes; if the check
fails, preserve and return the original Call node (or emit a targeted
diagnostic/log) instead of indexing out of bounds to avoid hard runtime/compile
errors.

In `@testing/python/hexagon/diagnose_hmx.py`:
- Around line 139-141: The failure message in
testing/python/hexagon/diagnose_hmx.py incorrectly references the pass name
LowerHMXIntrinsics; change the string literal that currently says "Check
LowerHMXIntrinsics implementation." to reference the correct backend pass name
"LowerHexagonIntrinsics" so triage points to the right implementation (update
the message around the HMX placeholder/no intrinsic error text in
diagnose_hmx.py).
- Around line 9-15: The try/except in diagnose_hmx.py currently swallows all
exceptions when probing LLVM/Hexagon target (the block that calls
tvm.runtime.enabled("llvm") and tvm.target.Target(...)); replace the broad
except Exception with a narrow catch for the expected TVM probe error (e.g.,
tvm.error.TVMError or the TVM-specific probe exception available in your TVM
version, falling back to RuntimeError only if necessary) and re-raise any other
exceptions so real failures are not silenced; apply the same change to the other
probe sites referenced (the blocks around lines with the gated tests at the
other two probe locations) and ensure error handling logs or returns False only
for the known probe failure type while allowing unexpected exceptions to
propagate.

In `@tilelang/cache/kernel_cache.py`:
- Around line 457-460: The save path allows IR-only entries by checking
_is_ir_only (via kernel.adapter.libpath is None) and omitting kernel_lib_path
from missing_files, but load-time required files (method _get_required_files)
still unconditionally demands kernel_lib_path; update _get_required_files to
detect the same IR-only condition (check kernel.adapter.libpath or reuse the
_is_ir_only logic) and exclude self.kernel_lib_path from the required list when
IR-only so save and load semantics match; ensure the change references
_is_ir_only, kernel.adapter.libpath, _get_missing_complete_cache_files,
_get_required_files, and kernel_lib_path so both save and load use the same
criterion.

In `@tilelang/carver/arch/__init__.py`:
- Line 52: The symbol is_hexagon_target was added to __all__ but never defined
or imported in this module, so imports fail; fix by importing or defining
is_hexagon_target in this module and ensuring it is present in the module
globals before exporting (e.g., add a proper import statement that brings
is_hexagon_target into tilelang.carver.arch or define the function here), then
keep it listed in __all__ so from tilelang.carver.arch import is_hexagon_target
and star-imports succeed.

In `@tilelang/intrinsics/hexagon/__init__.py`:
- Around line 94-101: The export list __all__ references
register_hexagon_memory_info which is not defined or imported, causing
import-time failures; fix by either importing the symbol into this module (e.g.,
add an import that provides register_hexagon_memory_info) or remove it from the
__all__ list so only existing names (hmx, HMXBuilder, mma, mma_fp16,
vtcm_dma_copy) are exported; update the module’s top-level imports or the
__all__ array accordingly and ensure any chosen import refers to the correct
source that defines register_hexagon_memory_info.
- Around line 53-67: The vtcm_dma_copy function currently constructs a
tir.Evaluate node but never emits it into the active TIR builder; change it to
call T.evaluate(...) (like the mma() helper does) so the extern DMA call is
actually inserted into the TIR; ensure tilelang.language is imported as T and
replace the tir.Evaluate(...) invocation inside vtcm_dma_copy with
T.evaluate(...) while keeping the tir.call_extern arguments (hexagon_dma_copy,
src.access_ptr("r"), dst.access_ptr("w")) unchanged.

In `@tilelang/jit/adapter/cython/adapter.py`:
- Around line 134-142: The Hexagon guard (is_hexagon_target(self.target)) runs
after the code that compiles/loads the generated library; move this check to run
before any compile/load steps so cross-compilation never attempts to load
host-incompatible artifacts—i.e., in the method containing the compile/load
logic, call is_hexagon_target(self.target) at the very start, return early and
set self._compiled_func = None (leaving kernel.kernel_source etc. intact) so the
JIT object can be inspected without performing host library loading.

In `@tilelang/jit/adapter/utils.py`:
- Around line 88-105: match_declare_kernel_cpu currently ignores the requested
symbol and always returns the first int32_t/define; detect when the caller
passed a function name (create_call_func passes "function_name(" into the
annotation param), extract the function name (e.g., annotation.split("(")[0] if
"(" present), then build the C and LLVM search patterns to match that exact
function (C: use a regex like r"\b<int_return>?\s+{re.escape(func_name)}\b" or
simply search for the function name with a preceding return/type token, LLVM:
r"define\s+.*@{re.escape(func_name)}\b"), use re.search on each line and return
source.index(match.group(0)) for the found match; if annotation is not a
function name keep the existing behavior but make patterns use word boundaries
so you don't pick substrings. Ensure you update match_declare_kernel_cpu to
reference the extracted func_name and to return the correct match start.

In `@tilelang/jit/adapter/wrapper.py`:
- Around line 988-992: The Hexagon target is being routed to TLCPUSourceWrapper
which expects C prototypes, but Hexagon emits LLVM IR; update the target routing
so Hexagon is parsed as IR instead of C: change the branch that checks
is_hexagon_target(self.target) (in the wrapper selection near
TLCPUSourceWrapper) to route Hexagon to an IR-aware wrapper (e.g., a new or
existing TLCPUIRWrapper) or enhance TLCPUSourceWrapper to detect LLVM IR and
parse declarations accordingly; also update create_call_func() so when handling
IR it extracts the function signature by splitting/locating the opening brace
'{' (or using an IR-specific parser) rather than split(";")[0], ensuring
argument extraction reads parameters not local IR instructions.

In `@tilelang/jit/kernel.py`:
- Around line 206-216: The current call path may attempt to call
self.torch_function when it is None; update the kernel invocation in the method
containing self.torch_function to raise a clear RuntimeError for uninitialized
kernels on all targets: keep the existing Hexagon-specific RuntimeError (using
is_hexagon_target(self.target)) and then add a generic RuntimeError if
self.torch_function is still None that explains the kernel is uninitialized and
cannot be executed on the host (mentioning to use the Hexagon SDK
Simulator/HexagonLauncher only in the Hexagon message). This ensures callers
receive a stable, descriptive error instead of a 'NoneType' object is not
callable' when invoking self.torch_function(*args, **kwds).

In `@tilelang/utils/target.py`:
- Around line 189-191: The normalization currently overwrites any Hexagon target
string (variable return_var) with a fixed "llvm -mtriple=hexagon
-mcpu=hexagonv73"; instead, update the logic in tilelang.utils.target (the block
handling Hexagon shorthands) to parse the existing return_var string and only
inject missing flags: if "-mtriple" is absent append "-mtriple=hexagon", if
"-mcpu" is absent append a default "-mcpu=hexagonv73", and preserve any existing
"-mcpu", "-mattr", or other flags already present in return_var; ensure you
handle cases where return_var is exactly "hexagon" (replace with "llvm" plus the
necessary flags) versus when it already starts with "llvm" (modify by appending
missing flags) so no explicit user flags are overwritten.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05bd7958-0e49-4313-b7e6-8667f85194cb

📥 Commits

Reviewing files that changed from the base of the PR and between 48998d2 and fd34124.

📒 Files selected for processing (18)

CMakeLists.txt
src/runtime/hexagon_runtime.cc
src/transform/lower_hexagon_intrinsics.cc
testing/python/hexagon/diagnose_hmx.py
testing/python/hexagon/test_hmx_mma.py
tilelang/cache/kernel_cache.py
tilelang/carver/arch/__init__.py
tilelang/carver/arch/hexagon.py
tilelang/engine/lower.py
tilelang/engine/phase.py
tilelang/intrinsics/hexagon/__init__.py
tilelang/jit/adapter/base.py
tilelang/jit/adapter/cython/adapter.py
tilelang/jit/adapter/libgen.py
tilelang/jit/adapter/utils.py
tilelang/jit/adapter/wrapper.py
tilelang/jit/kernel.py
tilelang/utils/target.py

coderabbitai · 2026-05-06T10:42:17Z

+          if (func_name->value == "hmx_mma_placeholder") {
+            Array<PrimExpr> new_args;
+            new_args.push_back(StringImm("HexKL_mma_i8acc32"));
+            new_args.push_back(
+                call->args[3]); // C_acc (accumulator — first arg to HexKL)
+            new_args.push_back(call->args[1]); // A_vtcm
+            new_args.push_back(call->args[2]); // B_vtcm
+            return Evaluate(


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the placeholder arity before indexing call->args[3].

This pass is globally callable on arbitrary TIR, so a malformed call_extern("hmx_mma_placeholder", ...) with fewer than four arguments will trip TVM's bounds checks and fail the compile with a hard error. Validate the expected operand count before rewriting, then either keep the node unchanged or emit a targeted diagnostic.

Proposed fix

// Lower HMX MMA placeholder if (func_name->value == "hmx_mma_placeholder") { + ICHECK_EQ(call->args.size(), 4) + << "hmx_mma_placeholder expects exactly 3 operands"; Array<PrimExpr> new_args; new_args.push_back(StringImm("HexKL_mma_i8acc32")); new_args.push_back( call->args[3]); // C_acc (accumulator — first arg to HexKL)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (func_name->value == "hmx_mma_placeholder") {

Array<PrimExpr> new_args;

new_args.push_back(StringImm("HexKL_mma_i8acc32"));

new_args.push_back(

call->args[3]); // C_acc (accumulator — first arg to HexKL)

new_args.push_back(call->args[1]); // A_vtcm

new_args.push_back(call->args[2]); // B_vtcm

return Evaluate(

if (func_name->value == "hmx_mma_placeholder") {

ICHECK_EQ(call->args.size(), 4)

<< "hmx_mma_placeholder expects exactly 3 operands";

Array<PrimExpr> new_args;

new_args.push_back(StringImm("HexKL_mma_i8acc32"));

new_args.push_back(

call->args[3]); // C_acc (accumulator — first arg to HexKL)

new_args.push_back(call->args[1]); // A_vtcm

new_args.push_back(call->args[2]); // B_vtcm

return Evaluate(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/transform/lower_hexagon_intrinsics.cc` around lines 28 - 35, The rewrite for the "hmx_mma_placeholder" intrinsic currently indexes call->args[3] without checking arity; update the branch that handles func_name->value == "hmx_mma_placeholder" in lower_hexagon_intrinsics.cc to first validate call->args.size() >= 4 and only perform the Array<PrimExpr> construction and return Evaluate(...) when that check passes; if the check fails, preserve and return the original Call node (or emit a targeted diagnostic/log) instead of indexing out of bounds to avoid hard runtime/compile errors.

coderabbitai · 2026-05-06T10:42:17Z

+    try:
+        if not tvm.runtime.enabled("llvm"):
+            return False
+        tvm.target.Target("llvm -mtriple=hexagon -mcpu=hexagonv73")
+        return True
+    except Exception:
+        return False


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cat -n testing/python/hexagon/diagnose_hmx.py | head -120

Repository: tile-ai/tilelang

Length of output: 5134

🏁 Script executed:

cat -n testing/python/hexagon/diagnose_hmx.py | sed -n '110,150p'

Repository: tile-ai/tilelang

Length of output: 1775

🏁 Script executed:

# Search for LowerHMXIntrinsics and LowerHexagonIntrinsics in the codebase rg "LowerHMXIntrinsics|LowerHexagonIntrinsics" -i

Repository: tile-ai/tilelang

Length of output: 793

🏁 Script executed:

# Check the lower.py file to see what exceptions might be expected cat -n tilelang/engine/lower.py | head -60

Repository: tile-ai/tilelang

Length of output: 2467

🏁 Script executed:

# Check what exceptions TVM Target raises by looking for its usage and error handling rg "tvm.target.Target" -A 3 -B 1 | head -80

Repository: tile-ai/tilelang

Length of output: 5048

🏁 Script executed:

# Look for TVM exception types and how Target errors are typically caught rg "except.*Error|except.*Exception" testing/python/hexagon/diagnose_hmx.py -B 3

Repository: tile-ai/tilelang

Length of output: 358

🏁 Script executed:

# Check if there are TVM-specific exception types in the codebase rg "from tvm|import.*tvm" testing/python/hexagon/diagnose_hmx.py | head -10

Repository: tile-ai/tilelang

Length of output: 92

🏁 Script executed:

# Check TVM exception handling patterns in the codebase rg "tvm.error|TVMError|from tvm.*import|except.*as e" -A 2 | head -100

Repository: tile-ai/tilelang

Length of output: 5401

🏁 Script executed:

# Check the diagnostic test more carefully - what's the context around line 141? cat -n testing/python/hexagon/diagnose_hmx.py | sed -n '135,145p'

Repository: tile-ai/tilelang

Length of output: 561

🏁 Script executed:

# Verify the actual pass name references rg "LowerHMXIntrinsics|LowerHexagonIntrinsics" -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 2746

🏁 Script executed:

# Check if there are similar skip-gate functions elsewhere in the codebase with better exception handling rg "pytest.mark.skipif" -B 2 -A 2 | head -60

Repository: tile-ai/tilelang

Length of output: 4792

Narrow the skip-gate exception handling to prevent silent skips.

Line 14 catches every exception and returns False, which can hide real regressions and silently skip both gated tests (Lines 64 and 110). Catch only expected probe failures and re-raise unexpected exceptions.

Suggested fix

def has_hexagon_codegen(): try: if not tvm.runtime.enabled("llvm"): return False tvm.target.Target("llvm -mtriple=hexagon -mcpu=hexagonv73") return True - except Exception: - return False + except Exception as err: + msg = str(err).lower() + # Expected probe failures: missing Hexagon/LLVM target support. + if "hexagon" in msg or "llvm" in msg or "target" in msg: + return False + raise

Please verify and, if available in your TVM version, prefer a concrete TVM exception type (e.g., TVM-specific error class) over message matching.

Also applies to: 64-65, 110-111

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 14-14: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@testing/python/hexagon/diagnose_hmx.py` around lines 9 - 15, The try/except in diagnose_hmx.py currently swallows all exceptions when probing LLVM/Hexagon target (the block that calls tvm.runtime.enabled("llvm") and tvm.target.Target(...)); replace the broad except Exception with a narrow catch for the expected TVM probe error (e.g., tvm.error.TVMError or the TVM-specific probe exception available in your TVM version, falling back to RuntimeError only if necessary) and re-raise any other exceptions so real failures are not silenced; apply the same change to the other probe sites referenced (the blocks around lines with the gated tests at the other two probe locations) and ensure error handling logs or returns False only for the known probe failure type while allowing unexpected exceptions to propagate.

coderabbitai · 2026-05-06T10:42:18Z

+        from tilelang.utils.target import is_hexagon_target
+
+        if is_hexagon_target(self.target):
+            # For Hexagon, we are cross-compiling.
+            # We cannot load symbols or execute this on the host machine.
+            # Returning early allows the JIT object to exist so we can
+            # inspect kernel.kernel_source.
+            self._compiled_func = None
+            return


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return before compiling/loading Hexagon artifacts.

By the time this guard runs, Lines 130-132 have already compiled and loaded the generated library. That's the host-incompatible step this branch is supposed to skip for cross-compiled Hexagon kernels.

Suggested fix

self.wrapper.assign_host_module(host_mod) self.wrapper.assign_device_module(device_mod) self.host_kernel_source = self.wrapper.wrap(self.get_kernel_source(kernel_only=True)) + from tilelang.utils.target import is_hexagon_target + + if is_hexagon_target(self.target): + # For Hexagon, we are cross-compiling. + # We cannot load symbols or execute this on the host machine. + # Returning early allows the JIT object to exist so we can + # inspect kernel.kernel_source. + self._compiled_func = None + return + self.lib_generator.update_lib_code(self.host_kernel_source) self.lib_generator.compile_lib() self.lib = self.lib_generator.load_lib() - - from tilelang.utils.target import is_hexagon_target - - if is_hexagon_target(self.target): - # For Hexagon, we are cross-compiling. - # We cannot load symbols or execute this on the host machine. - # Returning early allows the JIT object to exist so we can - # inspect kernel.kernel_source. - self._compiled_func = None - return

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/adapter/cython/adapter.py` around lines 134 - 142, The Hexagon guard (is_hexagon_target(self.target)) runs after the code that compiles/loads the generated library; move this check to run before any compile/load steps so cross-compilation never attempts to load host-incompatible artifacts—i.e., in the method containing the compile/load logic, call is_hexagon_target(self.target) at the very start, return early and set self._compiled_func = None (leaving kernel.kernel_source etc. intact) so the JIT object can be inspected without performing host library loading.

coderabbitai · 2026-05-06T10:42:18Z

 def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:
-    pattern = r"int32_t\s+\w+"
+    # C-style signature
+    pattern_c = r"int32_t\s+\w+"
+    # LLVM-style signature
+    pattern_llvm = r"define\s+.*@(?!llvm\.)\w+"
+
    for line in source.split("\n"):
-        if annotation in line:
-            matched = re.findall(pattern, line)
-            if len(matched) >= 1:
-                return source.index(matched[0] + "(")
+        # C pattern
+        matched = re.findall(pattern_c, line)
+        if matched:
+            return source.index(matched[0])
+        # LLVM pattern
+        matched = re.findall(pattern_llvm, line)
+        if matched:
+            # Match the start of the 'define'
+            return source.index(matched[0])
+
    raise ValueError("No global kernel found in the source code")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Match the requested symbol instead of the first CPU declaration.

annotation is ignored here now, so every lookup returns the first int32_t/define in the file. TLCPUSourceWrapper.create_call_func() passes function_name + "(" into this helper, so multi-function CPU/LLVM sources will bind later wrappers to the wrong declaration.

Suggested fix

def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int: - # C-style signature - pattern_c = r"int32_t\s+\w+" - # LLVM-style signature - pattern_llvm = r"define\s+.*@(?!llvm\.)\w+" + func_name = annotation[:-1] if annotation.endswith("(") else None + if func_name and func_name != "int32_t": + pattern_c = rf"\bint32_t\s+{re.escape(func_name)}\b" + pattern_llvm = rf"\bdefine\b[^\n@]*@{re.escape(func_name)}\b" + else: + pattern_c = r"\bint32_t\s+\w+\b" + pattern_llvm = r"\bdefine\b[^\n@]*@(?!llvm\.)\w+\b" for line in source.split("\n"): # C pattern matched = re.findall(pattern_c, line)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:

pattern = r"int32_t\s+\w+"

# C-style signature

pattern_c = r"int32_t\s+\w+"

# LLVM-style signature

pattern_llvm = r"define\s+.*@(?!llvm\.)\w+"

for line in source.split("\n"):

if annotation in line:

matched = re.findall(pattern, line)

if len(matched) >= 1:

return source.index(matched[0] + "(")

# C pattern

matched = re.findall(pattern_c, line)

if matched:

return source.index(matched[0])

# LLVM pattern

matched = re.findall(pattern_llvm, line)

if matched:

# Match the start of the 'define'

return source.index(matched[0])

raise ValueError("No global kernel found in the source code")

def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:

func_name = annotation[:-1] if annotation.endswith("(") else None

if func_name and func_name != "int32_t":

pattern_c = rf"\bint32_t\s+{re.escape(func_name)}\b"

pattern_llvm = rf"\bdefine\b[^\n@]*@{re.escape(func_name)}\b"

else:

pattern_c = r"\bint32_t\s+\w+\b"

pattern_llvm = r"\bdefine\b[^\n@]*@(?!llvm\.)\w+\b"

for line in source.split("\n"):

# C pattern

matched = re.findall(pattern_c, line)

if matched:

return source.index(matched[0])

# LLVM pattern

matched = re.findall(pattern_llvm, line)

if matched:

# Match the start of the 'define'

return source.index(matched[0])

raise ValueError("No global kernel found in the source code")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/adapter/utils.py` around lines 88 - 105, match_declare_kernel_cpu currently ignores the requested symbol and always returns the first int32_t/define; detect when the caller passed a function name (create_call_func passes "function_name(" into the annotation param), extract the function name (e.g., annotation.split("(")[0] if "(" present), then build the C and LLVM search patterns to match that exact function (C: use a regex like r"\b<int_return>?\s+{re.escape(func_name)}\b" or simply search for the function name with a preceding return/type token, LLVM: r"define\s+.*@{re.escape(func_name)}\b"), use re.search on each line and return source.index(match.group(0)) for the found match; if annotation is not a function name keep the existing behavior but make patterns use word boundaries so you don't pick substrings. Ensure you update match_declare_kernel_cpu to reference the extracted func_name and to return the correct match start.

coderabbitai · 2026-05-06T10:42:18Z

+        elif is_cpu_target(self.target) or is_hexagon_target(self.target):
+            wrapper_class = TLCPUSourceWrapper
        else:
-            raise ValueError(f"Unsupported platform: {self.arch.platform}")
+            arch_name = self.arch.platform if self.arch else "unknown"
+            raise ValueError(f"Unsupported platform: {arch_name}")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the wrapper.py file and the specific lines mentioned find . -type f -name "wrapper.py" | grep -E "jit/adapter"

Repository: tile-ai/tilelang

Length of output: 176

🏁 Script executed:

# Get the file size and content around lines 988-992 wc -l tilelang/jit/adapter/wrapper.py sed -n '980,1000p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 1006

🏁 Script executed:

# Now let's find the create_call_func method in TLCPUSourceWrapper grep -n "create_call_func" tilelang/jit/adapter/wrapper.py | head -20

Repository: tile-ai/tilelang

Length of output: 200

🏁 Script executed:

# Search for TLCPUSourceWrapper class definition and create_call_func implementation grep -n "class TLCPUSourceWrapper" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 90

🏁 Script executed:

# Search for match_declare_kernel_cpu in utils.py grep -n "match_declare_kernel_cpu" tilelang/jit/adapter/utils.py

Repository: tile-ai/tilelang

Length of output: 142

🏁 Script executed:

# Read the create_call_func method implementation sed -n '790,850p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 3011

🏁 Script executed:

# Read the match_declare_kernel_cpu function sed -n '88,120p' tilelang/jit/adapter/utils.py

Repository: tile-ai/tilelang

Length of output: 982

🏁 Script executed:

# Check what type of source code is passed to TLCPUSourceWrapper # Look at the context where wrapper_class is instantiated sed -n '975,1010p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 1442

🏁 Script executed:

# Search for Hexagon-related code generation and what source format it produces rg "hexagon" -i tilelang/jit/adapter/ -A 3 -B 3

Repository: tile-ai/tilelang

Length of output: 5418

🏁 Script executed:

# Look for where the source code is generated and what format it takes for Hexagon rg "is_hexagon_target" -i tilelang/jit/ -A 5 -B 5

Repository: tile-ai/tilelang

Length of output: 6891

🏁 Script executed:

# Check if there's any LLVM IR being passed for Hexagon grep -r "llvm" tilelang/jit/adapter/ -i

Repository: tile-ai/tilelang

Length of output: 327

🏁 Script executed:

# Let's trace back - look at what generates c_source that's passed to wrap() grep -n "def wrap" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 181

🏁 Script executed:

# See where wrap() is called and what source is passed rg "\.wrap\(" tilelang/jit/ -B 5

Repository: tile-ai/tilelang

Length of output: 1749

🏁 Script executed:

# Check what get_kernel_source returns and what format Hexagon uses grep -n "get_kernel_source" tilelang/jit/adapter/ -r -A 3

Repository: tile-ai/tilelang

Length of output: 2051

🏁 Script executed:

# Look for Hexagon-specific code generation find tilelang -path "*hexagon*" -type f | head -20

Repository: tile-ai/tilelang

Length of output: 132

🏁 Script executed:

# Check what kind of source TVM Hexagon backend produces grep -n "llvm" tilelang/jit/adapter/wrapper.py -i

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Look at what happens in the Cython adapter for get_kernel_source grep -n "def get_kernel_source" tilelang/jit/adapter/ -r -A 15

Repository: tile-ai/tilelang

Length of output: 5063

🏁 Script executed:

# Check what code format TVM's Hexagon target produces grep -n "hexagon" tilelang/jit/adapter/tvm_ffi.py -i -B 3 -A 3

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Look at the cython adapter to understand the flow for Hexagon sed -n '100,135p' tilelang/jit/adapter/cython/adapter.py

Repository: tile-ai/tilelang

Length of output: 1623

🏁 Script executed:

# Check TVM target creation for Hexagon grep -r "Target\|target" tilelang/jit/adapter/cython/adapter.py | grep -i hexagon

Repository: tile-ai/tilelang

Length of output: 240

🏁 Script executed:

# Look for what TVM codegen produces - check if there's any indication of LLVM vs C grep -n "device_kernel_source" tilelang/jit/adapter/cython/adapter.py | head -10

Repository: tile-ai/tilelang

Length of output: 656

🏁 Script executed:

# Let's check if Hexagon can produce LLVM IR by examining TVM integration grep -n "ir_module\|IRModule" tilelang/jit/adapter/cython/adapter.py -B 2 -A 2 | head -40

Repository: tile-ai/tilelang

Length of output: 1633

🏁 Script executed:

# Check what format TVM produces for different targets # Look for codegen or backend specifications find tilelang -name "*.py" | xargs grep -l "hexagon" | head -5

Repository: tile-ai/tilelang

Length of output: 219

🏁 Script executed:

# Check the hexagon arch file to understand what source it processes sed -n '1,50p' tilelang/carver/arch/hexagon.py

Repository: tile-ai/tilelang

Length of output: 2022

🏁 Script executed:

# Look for test cases or examples that use Hexagon to see what source format is expected find . -path "*/test*" -name "*.py" | xargs grep -l "hexagon" 2>/dev/null | head -3

Repository: tile-ai/tilelang

Length of output: 142

🏁 Script executed:

# Check what code is generated by looking at a simple example or test # Look for inspect_source usage with Hexagon grep -r "inspect_source\|get_device_source" tilelang/jit/ -B 3 -A 3 | head -40

Repository: tile-ai/tilelang

Length of output: 2068

🏁 Script executed:

# The key insight: Let's check if there's any LLVM IR handling elsewhere in CPU wrapper grep -n "LLVM\|llvm\|define" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 91

🏁 Script executed:

# Check test cases to see what format Hexagon source is expected to be sed -n '1,100p' ./testing/python/hexagon/test_hmx_mma.py

Repository: tile-ai/tilelang

Length of output: 3388

🏁 Script executed:

# Check what TVM produces for Hexagon - look at lower.py grep -n "hexagon\|llvm" tilelang/engine/lower.py -i -B 2 -A 2 | head -50

Repository: tile-ai/tilelang

Length of output: 1992

🏁 Script executed:

# The critical test: what does inspect_source() return for a Hexagon target? # Let's check if there's documentation or examples grep -r "inspect_source" tilelang/ --include="*.py" -B 1 -A 1 | grep -i "hexagon\|llvm" -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Let's check what TVM's Hexagon target backend is set to grep -n "hexagon\|llvm" tilelang/utils/target.py -i -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 2237

🏁 Script executed:

# Confirm that split(";") will fail on LLVM IR python3 << 'EOF' # Example LLVM function llvm_func = '''define i32 `@kernel`(i32* %arg0, i32 %arg1) { %1 = load i32, i32* %arg0 %2 = add i32 %1, %arg1 ret i32 %2 }''' # What happens with split(";")[0] declaration = llvm_func.split(";")[0] print("LLVM function:") print(llvm_func) print("\nResult of split(';')[0]:") print(declaration) print("\nThis includes the entire function body, not just the declaration!") # Example C function c_func = '''int32_t kernel(int32_t* arg0, int32_t arg1); { // body }''' declaration_c = c_func.split(";")[0] print("\n\nC function:") print(c_func) print("\nResult of split(';')[0]:") print(declaration_c) print("\nThis correctly stops at the declaration!") EOF

Repository: tile-ai/tilelang

Length of output: 577

TLCPUSourceWrapper parses C prototypes, not LLVM IR.

This branch routes Hexagon (which generates LLVM IR, not C code) through TLCPUSourceWrapper. However, create_call_func() still extracts declarations with split(";")[0], which fails for LLVM IR—LLVM uses braces, not semicolons. This causes the parser to include the entire function body in the "declaration", and argument extraction will then match locals instead of parameters, producing malformed function calls. Update the CPU wrapper to parse LLVM IR before routing Hexagon here.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/adapter/wrapper.py` around lines 988 - 992, The Hexagon target is being routed to TLCPUSourceWrapper which expects C prototypes, but Hexagon emits LLVM IR; update the target routing so Hexagon is parsed as IR instead of C: change the branch that checks is_hexagon_target(self.target) (in the wrapper selection near TLCPUSourceWrapper) to route Hexagon to an IR-aware wrapper (e.g., a new or existing TLCPUIRWrapper) or enhance TLCPUSourceWrapper to detect LLVM IR and parse declarations accordingly; also update create_call_func() so when handling IR it extracts the function signature by splitting/locating the opening brace '{' (or using an IR-specific parser) rather than split(";")[0], ensuring argument extraction reads parameters not local IR instructions.

coderabbitai · 2026-05-06T10:42:18Z

+        if self.torch_function is None:
+            from tilelang.utils.target import is_hexagon_target
+
+            if is_hexagon_target(self.target):
+                raise RuntimeError(
+                    "Hexagon kernels cannot be executed directly on the host machine. "
+                    "To run this kernel, please use the Hexagon SDK Simulator or "
+                    "the HexagonLauncher on a supported Qualcomm device."
+                )
+
        return self.torch_function(*args, **kwds)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raise for all uninitialized kernels, not just Hexagon.

On non-Hexagon targets this still falls through to self.torch_function(*args, **kwds), which becomes a raw 'NoneType' object is not callable'. Add the generic RuntimeError after the Hexagon-specific branch so callers always get a stable error here.

Proposed fix

if self.torch_function is None: from tilelang.utils.target import is_hexagon_target if is_hexagon_target(self.target): raise RuntimeError( "Hexagon kernels cannot be executed directly on the host machine. " "To run this kernel, please use the Hexagon SDK Simulator or " "the HexagonLauncher on a supported Qualcomm device." ) + raise RuntimeError("The compiled function is not initialized.") return self.torch_function(*args, **kwds)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.torch_function is None:

from tilelang.utils.target import is_hexagon_target

if is_hexagon_target(self.target):

raise RuntimeError(

"Hexagon kernels cannot be executed directly on the host machine. "

"To run this kernel, please use the Hexagon SDK Simulator or "

"the HexagonLauncher on a supported Qualcomm device."

)

return self.torch_function(*args, **kwds)

if self.torch_function is None:

from tilelang.utils.target import is_hexagon_target

if is_hexagon_target(self.target):

raise RuntimeError(

"Hexagon kernels cannot be executed directly on the host machine. "

"To run this kernel, please use the Hexagon SDK Simulator or "

"the HexagonLauncher on a supported Qualcomm device."

)

raise RuntimeError("The compiled function is not initialized.")

return self.torch_function(*args, **kwds)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/kernel.py` around lines 206 - 216, The current call path may attempt to call self.torch_function when it is None; update the kernel invocation in the method containing self.torch_function to raise a clear RuntimeError for uninitialized kernels on all targets: keep the existing Hexagon-specific RuntimeError (using is_hexagon_target(self.target)) and then add a generic RuntimeError if self.torch_function is still None that explains the kernel is uninitialized and cannot be executed on the host (mentioning to use the Hexagon SDK Simulator/HexagonLauncher only in the Hexagon message). This ensures callers receive a stable, descriptive error instead of a 'NoneType' object is not callable' when invoking self.torch_function(*args, **kwds).

coderabbitai · 2026-05-06T10:42:18Z

+    # Handle Backend-Specific Normalization (Shorthands)
+    if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:
+        return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve explicit -mcpu and other flags when normalizing Hexagon strings.

This currently rewrites any string containing "hexagon" to a fixed llvm -mtriple=hexagon -mcpu=hexagonv73. Inputs like "llvm -mcpu=hexagonv75" lose the requested CPU/features and silently downgrade codegen.

Suggested fix

# Handle Backend-Specific Normalization (Shorthands) if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var: - return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73" + normalized = return_var.strip() + if normalized == "hexagon": + return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73" + elif normalized.startswith("llvm"): + return_var = f"{normalized} -mtriple=hexagon" + else: + return_var = f"llvm -mtriple=hexagon {normalized}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Handle Backend-Specific Normalization (Shorthands)

if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:

return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"

# Handle Backend-Specific Normalization (Shorthands)

if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:

normalized = return_var.strip()

if normalized == "hexagon":

return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"

elif normalized.startswith("llvm"):

return_var = f"{normalized} -mtriple=hexagon"

else:

return_var = f"llvm -mtriple=hexagon {normalized}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/utils/target.py` around lines 189 - 191, The normalization currently overwrites any Hexagon target string (variable return_var) with a fixed "llvm -mtriple=hexagon -mcpu=hexagonv73"; instead, update the logic in tilelang.utils.target (the block handling Hexagon shorthands) to parse the existing return_var string and only inject missing flags: if "-mtriple" is absent append "-mtriple=hexagon", if "-mcpu" is absent append a default "-mcpu=hexagonv73", and preserve any existing "-mcpu", "-mattr", or other flags already present in return_var; ensure you handle cases where return_var is exactly "hexagon" (replace with "llvm" plus the necessary flags) versus when it already starts with "llvm" (modify by appending missing flags) so no explicit user flags are overwritten.

LeiWang1999 · 2026-05-07T04:34:48Z

@Calaweh Thanks for your contributions! We're currently decoupling different backends, and after those works are wrapped up, I will take a look for this pr.

…lowering pipeline

…_LLVM restore, tirx migration, target dict form)

coderabbitai

🧹 Nitpick comments (1)

CMakeLists.txt (1)

339-347: 💤 Low value

Minor: USE_LLVM string comparison is case-sensitive.

The USE_LLVM STREQUAL "ON" check at line 339 only matches the uppercase literal. If a user passes -DUSE_LLVM=on or -DUSE_LLVM=On, the block is skipped and llvm-config is never located, leaving USE_LLVM as the raw string "on" which TVM may mishandle.

Additionally, the PR description states "Requires LLVM 17 or 18", but the generic llvm-config fallback could find LLVM 16 or 19 without any version validation.

🔧 Optional fix for case-insensitive matching and version check

-if(USE_LLVM STREQUAL "ON")
+string(TOUPPER "${USE_LLVM}" _USE_LLVM_UPPER)
+if(_USE_LLVM_UPPER STREQUAL "ON")
   find_program(LLVM_CONFIG_PATH NAMES llvm-config-18 llvm-config-17 llvm-config)
   if(LLVM_CONFIG_PATH)
+    execute_process(
+      COMMAND "${LLVM_CONFIG_PATH}" --version
+      OUTPUT_VARIABLE _LLVM_VERSION
+      OUTPUT_STRIP_TRAILING_WHITESPACE
+    )
+    if(NOT _LLVM_VERSION MATCHES "^(17|18)\\.")
+      message(WARNING "Found LLVM ${_LLVM_VERSION}; Hexagon support requires LLVM 17 or 18.")
+    endif()
     set(USE_LLVM ${LLVM_CONFIG_PATH} CACHE STRING "Path to llvm-config" FORCE)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CMakeLists.txt` around lines 339 - 347, The current check if(USE_LLVM
STREQUAL "ON") is case-sensitive and can miss values like "on"/"On"; change the
condition to accept any common truthy forms (e.g., check if(USE_LLVM) or
explicitly compare against "ON"/"On"/"on") so the block always runs when the
user enables LLVM, then after finding LLVM_CONFIG_PATH run llvm-config --version
(via execute_process) and parse the major version number returned by llvm-config
to ensure it is 17 or 18 (if not, message(FATAL_ERROR) with a clear message);
keep the existing behavior of setting USE_LLVM to the found llvm-config path
(set(USE_LLVM ${LLVM_CONFIG_PATH} CACHE STRING ... FORCE)) when validation
passes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@CMakeLists.txt`:
- Around line 339-347: The current check if(USE_LLVM STREQUAL "ON") is
case-sensitive and can miss values like "on"/"On"; change the condition to
accept any common truthy forms (e.g., check if(USE_LLVM) or explicitly compare
against "ON"/"On"/"on") so the block always runs when the user enables LLVM,
then after finding LLVM_CONFIG_PATH run llvm-config --version (via
execute_process) and parse the major version number returned by llvm-config to
ensure it is 17 or 18 (if not, message(FATAL_ERROR) with a clear message); keep
the existing behavior of setting USE_LLVM to the found llvm-config path
(set(USE_LLVM ${LLVM_CONFIG_PATH} CACHE STRING ... FORCE)) when validation
passes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82a9b39e-db6f-4bef-a5de-c27986690384

📥 Commits

Reviewing files that changed from the base of the PR and between 1a9aae6 and 3ada183.

📒 Files selected for processing (3)

CMakeLists.txt
src/runtime/hexagon_runtime.cc
src/transform/lower_hexagon_intrinsics.cc

💤 Files with no reviewable changes (2)

src/transform/lower_hexagon_intrinsics.cc
src/runtime/hexagon_runtime.cc

q55180514 · 2026-06-15T08:35:56Z

@Calaweh hello,Does your PR support the v68 architecture?

Calaweh · 2026-06-15T10:06:38Z

@q55180514 Yes, v68 is supported. I verified this by compiling the HMX matmul kernel with {"kind": "llvm", "mtriple": "hexagon", "mcpu": "hexagonv68"} and inspecting the generated LLVM IR:

target triple = "hexagon" is present
hexagonv68 appears in the IR datalayout (not downgraded to v73)
HexKL_mma_i8acc32 intrinsic is correctly emitted
hmx_mma_placeholder is fully lowered — no placeholder remains

The LowerHexagonIntrinsics pass is architecture-agnostic at the IR level; it rewrites the placeholder to HexKL_mma_i8acc32 regardless of the specific Hexagon version. LLVM then handles the v68-specific instruction selection. Physical device or simulator validation on v68 hardware would be needed to confirm runtime correctness beyond codegen.

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

LeiWang1999 self-requested a review May 18, 2026 06:20

Calaweh force-pushed the feat/hexagon-hmx branch from 93df00f to 1a9aae6 Compare May 20, 2026 13:53

Calaweh added 3 commits May 21, 2026 09:40

feat: add initial Hexagon architecture support and HMX MMA intrinsic …

889ab51

…lowering pipeline

fix: resolve pre-commit failures in diagnose_hmx.py

ee5563f

[fix] resolve Hexagon/HMX pipeline issues (tvm_access_ptr unwrap, USE…

3ada183

…_LLVM restore, tirx migration, target dict form)

Calaweh force-pushed the feat/hexagon-hmx branch from 1a9aae6 to 3ada183 Compare May 21, 2026 05:43

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Uh oh!

Conversation

Calaweh commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title: Add Hexagon Backend with HMX Support for Matrix Multiplication

Summary

Why

Key Changes

Hardware Lowering

Memory Architecture

Runtime Support

JIT Infrastructure

Validation

Additional Checks

Compatibility

Related Issues

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

LeiWang1999 commented May 7, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

q55180514 commented Jun 15, 2026

Uh oh!

Calaweh commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Calaweh commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading