[feat] Add Hexagon HMX backend support#2155
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughThis PR adds Hexagon HMX support: build/runtime wiring, intrinsics and lowering pass, architecture and target detection, JIT/libgen cross-compile guards, kernel cache adjustments, and tests/diagnostics for Hexagon codegen and lowering. ChangesHexagon Backend with HMX Support
Sequence DiagramsequenceDiagram
participant User
participant Compiler
participant Lowering
participant LowerHexagonIntrinsics
participant JIT
participant FFI
participant HMX
User->>Compiler: request compile for Hexagon target
Compiler->>Compiler: detect is_hexagon_target()
Compiler->>Lowering: lower() (Hexagon branch)
Lowering->>LowerHexagonIntrinsics: apply lowering pass
LowerHexagonIntrinsics-->>Lowering: return IR with HexKL intrinsics
Lowering-->>JIT: emit IR / produce IR-only artifact
User->>FFI: call tilelang.hexagon.hmx_kernel_launch
FFI->>FFI: extract kernel Function and args
FFI->>HMX: enter HexagonHtp (power on HMX)
HMX-->>FFI: execute HMX MMA and return result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CMakeLists.txt`:
- Around line 157-186: The build currently defines TILELANG_HEXAGON_ENABLED
whenever USE_LLVM is truthy; update the CMake logic (around the llvm-config
discovery that sets LLVM_CONFIG_PATH/USE_LLVM and the later block that sets
TILELANG_HEXAGON_ENABLED) to actually verify the LLVM toolchain has Hexagon:
after finding llvm-config (LLVM_CONFIG_PATH) run llvm-config --targets-built (or
invoke it via execute_process) and check the output contains "Hexagon"; only
then set the TILELANG_HEXAGON_ENABLED definition and the "TileLang Build with
LLVM" status message. If llvm-config is not found or it does not report Hexagon,
change the earlier non-fatal warning into a fatal error (or at minimum do not
enable TILELANG_HEXAGON_ENABLED) so hexagon_runtime.cc and related transforms
are not advertised/compiled incorrectly; refer to the LLVM_CONFIG_PATH/USE_LLVM
and TILELANG_HEXAGON_ENABLED symbols to locate where to add the execute_process
and conditional enablement.
In `@src/runtime/hexagon_runtime.cc`:
- Around line 16-23: Ensure you validate args.size() before accessing args[0] or
constructing ffi::PackedArgs to avoid UB when called with zero arguments: in the
lambda that reads args and creates kernel and kernel_args, first check that
args.size() >= 1, and if not set an appropriate runtime error/return (e.g.,
populate rv with an error or throw a tvm/ffi runtime error) instead of
proceeding; specifically modify the block that uses args[0], ffi::Function
kernel, and ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1) to
perform the size check and early error return.
In `@src/transform/lower_hexagon_intrinsics.cc`:
- Around line 28-35: The rewrite for the "hmx_mma_placeholder" intrinsic
currently indexes call->args[3] without checking arity; update the branch that
handles func_name->value == "hmx_mma_placeholder" in lower_hexagon_intrinsics.cc
to first validate call->args.size() >= 4 and only perform the Array<PrimExpr>
construction and return Evaluate(...) when that check passes; if the check
fails, preserve and return the original Call node (or emit a targeted
diagnostic/log) instead of indexing out of bounds to avoid hard runtime/compile
errors.
In `@testing/python/hexagon/diagnose_hmx.py`:
- Around line 139-141: The failure message in
testing/python/hexagon/diagnose_hmx.py incorrectly references the pass name
LowerHMXIntrinsics; change the string literal that currently says "Check
LowerHMXIntrinsics implementation." to reference the correct backend pass name
"LowerHexagonIntrinsics" so triage points to the right implementation (update
the message around the HMX placeholder/no intrinsic error text in
diagnose_hmx.py).
- Around line 9-15: The try/except in diagnose_hmx.py currently swallows all
exceptions when probing LLVM/Hexagon target (the block that calls
tvm.runtime.enabled("llvm") and tvm.target.Target(...)); replace the broad
except Exception with a narrow catch for the expected TVM probe error (e.g.,
tvm.error.TVMError or the TVM-specific probe exception available in your TVM
version, falling back to RuntimeError only if necessary) and re-raise any other
exceptions so real failures are not silenced; apply the same change to the other
probe sites referenced (the blocks around lines with the gated tests at the
other two probe locations) and ensure error handling logs or returns False only
for the known probe failure type while allowing unexpected exceptions to
propagate.
In `@tilelang/cache/kernel_cache.py`:
- Around line 457-460: The save path allows IR-only entries by checking
_is_ir_only (via kernel.adapter.libpath is None) and omitting kernel_lib_path
from missing_files, but load-time required files (method _get_required_files)
still unconditionally demands kernel_lib_path; update _get_required_files to
detect the same IR-only condition (check kernel.adapter.libpath or reuse the
_is_ir_only logic) and exclude self.kernel_lib_path from the required list when
IR-only so save and load semantics match; ensure the change references
_is_ir_only, kernel.adapter.libpath, _get_missing_complete_cache_files,
_get_required_files, and kernel_lib_path so both save and load use the same
criterion.
In `@tilelang/carver/arch/__init__.py`:
- Line 52: The symbol is_hexagon_target was added to __all__ but never defined
or imported in this module, so imports fail; fix by importing or defining
is_hexagon_target in this module and ensuring it is present in the module
globals before exporting (e.g., add a proper import statement that brings
is_hexagon_target into tilelang.carver.arch or define the function here), then
keep it listed in __all__ so from tilelang.carver.arch import is_hexagon_target
and star-imports succeed.
In `@tilelang/intrinsics/hexagon/__init__.py`:
- Around line 94-101: The export list __all__ references
register_hexagon_memory_info which is not defined or imported, causing
import-time failures; fix by either importing the symbol into this module (e.g.,
add an import that provides register_hexagon_memory_info) or remove it from the
__all__ list so only existing names (hmx, HMXBuilder, mma, mma_fp16,
vtcm_dma_copy) are exported; update the module’s top-level imports or the
__all__ array accordingly and ensure any chosen import refers to the correct
source that defines register_hexagon_memory_info.
- Around line 53-67: The vtcm_dma_copy function currently constructs a
tir.Evaluate node but never emits it into the active TIR builder; change it to
call T.evaluate(...) (like the mma() helper does) so the extern DMA call is
actually inserted into the TIR; ensure tilelang.language is imported as T and
replace the tir.Evaluate(...) invocation inside vtcm_dma_copy with
T.evaluate(...) while keeping the tir.call_extern arguments (hexagon_dma_copy,
src.access_ptr("r"), dst.access_ptr("w")) unchanged.
In `@tilelang/jit/adapter/cython/adapter.py`:
- Around line 134-142: The Hexagon guard (is_hexagon_target(self.target)) runs
after the code that compiles/loads the generated library; move this check to run
before any compile/load steps so cross-compilation never attempts to load
host-incompatible artifacts—i.e., in the method containing the compile/load
logic, call is_hexagon_target(self.target) at the very start, return early and
set self._compiled_func = None (leaving kernel.kernel_source etc. intact) so the
JIT object can be inspected without performing host library loading.
In `@tilelang/jit/adapter/utils.py`:
- Around line 88-105: match_declare_kernel_cpu currently ignores the requested
symbol and always returns the first int32_t/define; detect when the caller
passed a function name (create_call_func passes "function_name(" into the
annotation param), extract the function name (e.g., annotation.split("(")[0] if
"(" present), then build the C and LLVM search patterns to match that exact
function (C: use a regex like r"\b<int_return>?\s+{re.escape(func_name)}\b" or
simply search for the function name with a preceding return/type token, LLVM:
r"define\s+.*@{re.escape(func_name)}\b"), use re.search on each line and return
source.index(match.group(0)) for the found match; if annotation is not a
function name keep the existing behavior but make patterns use word boundaries
so you don't pick substrings. Ensure you update match_declare_kernel_cpu to
reference the extracted func_name and to return the correct match start.
In `@tilelang/jit/adapter/wrapper.py`:
- Around line 988-992: The Hexagon target is being routed to TLCPUSourceWrapper
which expects C prototypes, but Hexagon emits LLVM IR; update the target routing
so Hexagon is parsed as IR instead of C: change the branch that checks
is_hexagon_target(self.target) (in the wrapper selection near
TLCPUSourceWrapper) to route Hexagon to an IR-aware wrapper (e.g., a new or
existing TLCPUIRWrapper) or enhance TLCPUSourceWrapper to detect LLVM IR and
parse declarations accordingly; also update create_call_func() so when handling
IR it extracts the function signature by splitting/locating the opening brace
'{' (or using an IR-specific parser) rather than split(";")[0], ensuring
argument extraction reads parameters not local IR instructions.
In `@tilelang/jit/kernel.py`:
- Around line 206-216: The current call path may attempt to call
self.torch_function when it is None; update the kernel invocation in the method
containing self.torch_function to raise a clear RuntimeError for uninitialized
kernels on all targets: keep the existing Hexagon-specific RuntimeError (using
is_hexagon_target(self.target)) and then add a generic RuntimeError if
self.torch_function is still None that explains the kernel is uninitialized and
cannot be executed on the host (mentioning to use the Hexagon SDK
Simulator/HexagonLauncher only in the Hexagon message). This ensures callers
receive a stable, descriptive error instead of a 'NoneType' object is not
callable' when invoking self.torch_function(*args, **kwds).
In `@tilelang/utils/target.py`:
- Around line 189-191: The normalization currently overwrites any Hexagon target
string (variable return_var) with a fixed "llvm -mtriple=hexagon
-mcpu=hexagonv73"; instead, update the logic in tilelang.utils.target (the block
handling Hexagon shorthands) to parse the existing return_var string and only
inject missing flags: if "-mtriple" is absent append "-mtriple=hexagon", if
"-mcpu" is absent append a default "-mcpu=hexagonv73", and preserve any existing
"-mcpu", "-mattr", or other flags already present in return_var; ensure you
handle cases where return_var is exactly "hexagon" (replace with "llvm" plus the
necessary flags) versus when it already starts with "llvm" (modify by appending
missing flags) so no explicit user flags are overwritten.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 05bd7958-0e49-4313-b7e6-8667f85194cb
📒 Files selected for processing (18)
CMakeLists.txtsrc/runtime/hexagon_runtime.ccsrc/transform/lower_hexagon_intrinsics.cctesting/python/hexagon/diagnose_hmx.pytesting/python/hexagon/test_hmx_mma.pytilelang/cache/kernel_cache.pytilelang/carver/arch/__init__.pytilelang/carver/arch/hexagon.pytilelang/engine/lower.pytilelang/engine/phase.pytilelang/intrinsics/hexagon/__init__.pytilelang/jit/adapter/base.pytilelang/jit/adapter/cython/adapter.pytilelang/jit/adapter/libgen.pytilelang/jit/adapter/utils.pytilelang/jit/adapter/wrapper.pytilelang/jit/kernel.pytilelang/utils/target.py
| if (func_name->value == "hmx_mma_placeholder") { | ||
| Array<PrimExpr> new_args; | ||
| new_args.push_back(StringImm("HexKL_mma_i8acc32")); | ||
| new_args.push_back( | ||
| call->args[3]); // C_acc (accumulator — first arg to HexKL) | ||
| new_args.push_back(call->args[1]); // A_vtcm | ||
| new_args.push_back(call->args[2]); // B_vtcm | ||
| return Evaluate( |
There was a problem hiding this comment.
Guard the placeholder arity before indexing call->args[3].
This pass is globally callable on arbitrary TIR, so a malformed call_extern("hmx_mma_placeholder", ...) with fewer than four arguments will trip TVM's bounds checks and fail the compile with a hard error. Validate the expected operand count before rewriting, then either keep the node unchanged or emit a targeted diagnostic.
Proposed fix
// Lower HMX MMA placeholder
if (func_name->value == "hmx_mma_placeholder") {
+ ICHECK_EQ(call->args.size(), 4)
+ << "hmx_mma_placeholder expects exactly 3 operands";
Array<PrimExpr> new_args;
new_args.push_back(StringImm("HexKL_mma_i8acc32"));
new_args.push_back(
call->args[3]); // C_acc (accumulator — first arg to HexKL)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (func_name->value == "hmx_mma_placeholder") { | |
| Array<PrimExpr> new_args; | |
| new_args.push_back(StringImm("HexKL_mma_i8acc32")); | |
| new_args.push_back( | |
| call->args[3]); // C_acc (accumulator — first arg to HexKL) | |
| new_args.push_back(call->args[1]); // A_vtcm | |
| new_args.push_back(call->args[2]); // B_vtcm | |
| return Evaluate( | |
| if (func_name->value == "hmx_mma_placeholder") { | |
| ICHECK_EQ(call->args.size(), 4) | |
| << "hmx_mma_placeholder expects exactly 3 operands"; | |
| Array<PrimExpr> new_args; | |
| new_args.push_back(StringImm("HexKL_mma_i8acc32")); | |
| new_args.push_back( | |
| call->args[3]); // C_acc (accumulator — first arg to HexKL) | |
| new_args.push_back(call->args[1]); // A_vtcm | |
| new_args.push_back(call->args[2]); // B_vtcm | |
| return Evaluate( |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/transform/lower_hexagon_intrinsics.cc` around lines 28 - 35, The rewrite
for the "hmx_mma_placeholder" intrinsic currently indexes call->args[3] without
checking arity; update the branch that handles func_name->value ==
"hmx_mma_placeholder" in lower_hexagon_intrinsics.cc to first validate
call->args.size() >= 4 and only perform the Array<PrimExpr> construction and
return Evaluate(...) when that check passes; if the check fails, preserve and
return the original Call node (or emit a targeted diagnostic/log) instead of
indexing out of bounds to avoid hard runtime/compile errors.
| try: | ||
| if not tvm.runtime.enabled("llvm"): | ||
| return False | ||
| tvm.target.Target("llvm -mtriple=hexagon -mcpu=hexagonv73") | ||
| return True | ||
| except Exception: | ||
| return False |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n testing/python/hexagon/diagnose_hmx.py | head -120Repository: tile-ai/tilelang
Length of output: 5134
🏁 Script executed:
cat -n testing/python/hexagon/diagnose_hmx.py | sed -n '110,150p'Repository: tile-ai/tilelang
Length of output: 1775
🏁 Script executed:
# Search for LowerHMXIntrinsics and LowerHexagonIntrinsics in the codebase
rg "LowerHMXIntrinsics|LowerHexagonIntrinsics" -iRepository: tile-ai/tilelang
Length of output: 793
🏁 Script executed:
# Check the lower.py file to see what exceptions might be expected
cat -n tilelang/engine/lower.py | head -60Repository: tile-ai/tilelang
Length of output: 2467
🏁 Script executed:
# Check what exceptions TVM Target raises by looking for its usage and error handling
rg "tvm.target.Target" -A 3 -B 1 | head -80Repository: tile-ai/tilelang
Length of output: 5048
🏁 Script executed:
# Look for TVM exception types and how Target errors are typically caught
rg "except.*Error|except.*Exception" testing/python/hexagon/diagnose_hmx.py -B 3Repository: tile-ai/tilelang
Length of output: 358
🏁 Script executed:
# Check if there are TVM-specific exception types in the codebase
rg "from tvm|import.*tvm" testing/python/hexagon/diagnose_hmx.py | head -10Repository: tile-ai/tilelang
Length of output: 92
🏁 Script executed:
# Check TVM exception handling patterns in the codebase
rg "tvm.error|TVMError|from tvm.*import|except.*as e" -A 2 | head -100Repository: tile-ai/tilelang
Length of output: 5401
🏁 Script executed:
# Check the diagnostic test more carefully - what's the context around line 141?
cat -n testing/python/hexagon/diagnose_hmx.py | sed -n '135,145p'Repository: tile-ai/tilelang
Length of output: 561
🏁 Script executed:
# Verify the actual pass name references
rg "LowerHMXIntrinsics|LowerHexagonIntrinsics" -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 2746
🏁 Script executed:
# Check if there are similar skip-gate functions elsewhere in the codebase with better exception handling
rg "pytest.mark.skipif" -B 2 -A 2 | head -60Repository: tile-ai/tilelang
Length of output: 4792
Narrow the skip-gate exception handling to prevent silent skips.
Line 14 catches every exception and returns False, which can hide real regressions and silently skip both gated tests (Lines 64 and 110). Catch only expected probe failures and re-raise unexpected exceptions.
Suggested fix
def has_hexagon_codegen():
try:
if not tvm.runtime.enabled("llvm"):
return False
tvm.target.Target("llvm -mtriple=hexagon -mcpu=hexagonv73")
return True
- except Exception:
- return False
+ except Exception as err:
+ msg = str(err).lower()
+ # Expected probe failures: missing Hexagon/LLVM target support.
+ if "hexagon" in msg or "llvm" in msg or "target" in msg:
+ return False
+ raisePlease verify and, if available in your TVM version, prefer a concrete TVM exception type (e.g., TVM-specific error class) over message matching.
Also applies to: 64-65, 110-111
🧰 Tools
🪛 Ruff (0.15.12)
[warning] 14-14: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@testing/python/hexagon/diagnose_hmx.py` around lines 9 - 15, The try/except
in diagnose_hmx.py currently swallows all exceptions when probing LLVM/Hexagon
target (the block that calls tvm.runtime.enabled("llvm") and
tvm.target.Target(...)); replace the broad except Exception with a narrow catch
for the expected TVM probe error (e.g., tvm.error.TVMError or the TVM-specific
probe exception available in your TVM version, falling back to RuntimeError only
if necessary) and re-raise any other exceptions so real failures are not
silenced; apply the same change to the other probe sites referenced (the blocks
around lines with the gated tests at the other two probe locations) and ensure
error handling logs or returns False only for the known probe failure type while
allowing unexpected exceptions to propagate.
| from tilelang.utils.target import is_hexagon_target | ||
|
|
||
| if is_hexagon_target(self.target): | ||
| # For Hexagon, we are cross-compiling. | ||
| # We cannot load symbols or execute this on the host machine. | ||
| # Returning early allows the JIT object to exist so we can | ||
| # inspect kernel.kernel_source. | ||
| self._compiled_func = None | ||
| return |
There was a problem hiding this comment.
Return before compiling/loading Hexagon artifacts.
By the time this guard runs, Lines 130-132 have already compiled and loaded the generated library. That's the host-incompatible step this branch is supposed to skip for cross-compiled Hexagon kernels.
Suggested fix
self.wrapper.assign_host_module(host_mod)
self.wrapper.assign_device_module(device_mod)
self.host_kernel_source = self.wrapper.wrap(self.get_kernel_source(kernel_only=True))
+ from tilelang.utils.target import is_hexagon_target
+
+ if is_hexagon_target(self.target):
+ # For Hexagon, we are cross-compiling.
+ # We cannot load symbols or execute this on the host machine.
+ # Returning early allows the JIT object to exist so we can
+ # inspect kernel.kernel_source.
+ self._compiled_func = None
+ return
+
self.lib_generator.update_lib_code(self.host_kernel_source)
self.lib_generator.compile_lib()
self.lib = self.lib_generator.load_lib()
-
- from tilelang.utils.target import is_hexagon_target
-
- if is_hexagon_target(self.target):
- # For Hexagon, we are cross-compiling.
- # We cannot load symbols or execute this on the host machine.
- # Returning early allows the JIT object to exist so we can
- # inspect kernel.kernel_source.
- self._compiled_func = None
- return🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tilelang/jit/adapter/cython/adapter.py` around lines 134 - 142, The Hexagon
guard (is_hexagon_target(self.target)) runs after the code that compiles/loads
the generated library; move this check to run before any compile/load steps so
cross-compilation never attempts to load host-incompatible artifacts—i.e., in
the method containing the compile/load logic, call
is_hexagon_target(self.target) at the very start, return early and set
self._compiled_func = None (leaving kernel.kernel_source etc. intact) so the JIT
object can be inspected without performing host library loading.
| def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int: | ||
| pattern = r"int32_t\s+\w+" | ||
| # C-style signature | ||
| pattern_c = r"int32_t\s+\w+" | ||
| # LLVM-style signature | ||
| pattern_llvm = r"define\s+.*@(?!llvm\.)\w+" | ||
|
|
||
| for line in source.split("\n"): | ||
| if annotation in line: | ||
| matched = re.findall(pattern, line) | ||
| if len(matched) >= 1: | ||
| return source.index(matched[0] + "(") | ||
| # C pattern | ||
| matched = re.findall(pattern_c, line) | ||
| if matched: | ||
| return source.index(matched[0]) | ||
| # LLVM pattern | ||
| matched = re.findall(pattern_llvm, line) | ||
| if matched: | ||
| # Match the start of the 'define' | ||
| return source.index(matched[0]) | ||
|
|
||
| raise ValueError("No global kernel found in the source code") |
There was a problem hiding this comment.
Match the requested symbol instead of the first CPU declaration.
annotation is ignored here now, so every lookup returns the first int32_t/define in the file. TLCPUSourceWrapper.create_call_func() passes function_name + "(" into this helper, so multi-function CPU/LLVM sources will bind later wrappers to the wrong declaration.
Suggested fix
def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:
- # C-style signature
- pattern_c = r"int32_t\s+\w+"
- # LLVM-style signature
- pattern_llvm = r"define\s+.*@(?!llvm\.)\w+"
+ func_name = annotation[:-1] if annotation.endswith("(") else None
+ if func_name and func_name != "int32_t":
+ pattern_c = rf"\bint32_t\s+{re.escape(func_name)}\b"
+ pattern_llvm = rf"\bdefine\b[^\n@]*@{re.escape(func_name)}\b"
+ else:
+ pattern_c = r"\bint32_t\s+\w+\b"
+ pattern_llvm = r"\bdefine\b[^\n@]*@(?!llvm\.)\w+\b"
for line in source.split("\n"):
# C pattern
matched = re.findall(pattern_c, line)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int: | |
| pattern = r"int32_t\s+\w+" | |
| # C-style signature | |
| pattern_c = r"int32_t\s+\w+" | |
| # LLVM-style signature | |
| pattern_llvm = r"define\s+.*@(?!llvm\.)\w+" | |
| for line in source.split("\n"): | |
| if annotation in line: | |
| matched = re.findall(pattern, line) | |
| if len(matched) >= 1: | |
| return source.index(matched[0] + "(") | |
| # C pattern | |
| matched = re.findall(pattern_c, line) | |
| if matched: | |
| return source.index(matched[0]) | |
| # LLVM pattern | |
| matched = re.findall(pattern_llvm, line) | |
| if matched: | |
| # Match the start of the 'define' | |
| return source.index(matched[0]) | |
| raise ValueError("No global kernel found in the source code") | |
| def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int: | |
| func_name = annotation[:-1] if annotation.endswith("(") else None | |
| if func_name and func_name != "int32_t": | |
| pattern_c = rf"\bint32_t\s+{re.escape(func_name)}\b" | |
| pattern_llvm = rf"\bdefine\b[^\n@]*@{re.escape(func_name)}\b" | |
| else: | |
| pattern_c = r"\bint32_t\s+\w+\b" | |
| pattern_llvm = r"\bdefine\b[^\n@]*@(?!llvm\.)\w+\b" | |
| for line in source.split("\n"): | |
| # C pattern | |
| matched = re.findall(pattern_c, line) | |
| if matched: | |
| return source.index(matched[0]) | |
| # LLVM pattern | |
| matched = re.findall(pattern_llvm, line) | |
| if matched: | |
| # Match the start of the 'define' | |
| return source.index(matched[0]) | |
| raise ValueError("No global kernel found in the source code") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tilelang/jit/adapter/utils.py` around lines 88 - 105,
match_declare_kernel_cpu currently ignores the requested symbol and always
returns the first int32_t/define; detect when the caller passed a function name
(create_call_func passes "function_name(" into the annotation param), extract
the function name (e.g., annotation.split("(")[0] if "(" present), then build
the C and LLVM search patterns to match that exact function (C: use a regex like
r"\b<int_return>?\s+{re.escape(func_name)}\b" or simply search for the function
name with a preceding return/type token, LLVM:
r"define\s+.*@{re.escape(func_name)}\b"), use re.search on each line and return
source.index(match.group(0)) for the found match; if annotation is not a
function name keep the existing behavior but make patterns use word boundaries
so you don't pick substrings. Ensure you update match_declare_kernel_cpu to
reference the extracted func_name and to return the correct match start.
| elif is_cpu_target(self.target) or is_hexagon_target(self.target): | ||
| wrapper_class = TLCPUSourceWrapper | ||
| else: | ||
| raise ValueError(f"Unsupported platform: {self.arch.platform}") | ||
| arch_name = self.arch.platform if self.arch else "unknown" | ||
| raise ValueError(f"Unsupported platform: {arch_name}") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the wrapper.py file and the specific lines mentioned
find . -type f -name "wrapper.py" | grep -E "jit/adapter"Repository: tile-ai/tilelang
Length of output: 176
🏁 Script executed:
# Get the file size and content around lines 988-992
wc -l tilelang/jit/adapter/wrapper.py
sed -n '980,1000p' tilelang/jit/adapter/wrapper.pyRepository: tile-ai/tilelang
Length of output: 1006
🏁 Script executed:
# Now let's find the create_call_func method in TLCPUSourceWrapper
grep -n "create_call_func" tilelang/jit/adapter/wrapper.py | head -20Repository: tile-ai/tilelang
Length of output: 200
🏁 Script executed:
# Search for TLCPUSourceWrapper class definition and create_call_func implementation
grep -n "class TLCPUSourceWrapper" tilelang/jit/adapter/wrapper.pyRepository: tile-ai/tilelang
Length of output: 90
🏁 Script executed:
# Search for match_declare_kernel_cpu in utils.py
grep -n "match_declare_kernel_cpu" tilelang/jit/adapter/utils.pyRepository: tile-ai/tilelang
Length of output: 142
🏁 Script executed:
# Read the create_call_func method implementation
sed -n '790,850p' tilelang/jit/adapter/wrapper.pyRepository: tile-ai/tilelang
Length of output: 3011
🏁 Script executed:
# Read the match_declare_kernel_cpu function
sed -n '88,120p' tilelang/jit/adapter/utils.pyRepository: tile-ai/tilelang
Length of output: 982
🏁 Script executed:
# Check what type of source code is passed to TLCPUSourceWrapper
# Look at the context where wrapper_class is instantiated
sed -n '975,1010p' tilelang/jit/adapter/wrapper.pyRepository: tile-ai/tilelang
Length of output: 1442
🏁 Script executed:
# Search for Hexagon-related code generation and what source format it produces
rg "hexagon" -i tilelang/jit/adapter/ -A 3 -B 3Repository: tile-ai/tilelang
Length of output: 5418
🏁 Script executed:
# Look for where the source code is generated and what format it takes for Hexagon
rg "is_hexagon_target" -i tilelang/jit/ -A 5 -B 5Repository: tile-ai/tilelang
Length of output: 6891
🏁 Script executed:
# Check if there's any LLVM IR being passed for Hexagon
grep -r "llvm" tilelang/jit/adapter/ -iRepository: tile-ai/tilelang
Length of output: 327
🏁 Script executed:
# Let's trace back - look at what generates c_source that's passed to wrap()
grep -n "def wrap" tilelang/jit/adapter/wrapper.pyRepository: tile-ai/tilelang
Length of output: 181
🏁 Script executed:
# See where wrap() is called and what source is passed
rg "\.wrap\(" tilelang/jit/ -B 5Repository: tile-ai/tilelang
Length of output: 1749
🏁 Script executed:
# Check what get_kernel_source returns and what format Hexagon uses
grep -n "get_kernel_source" tilelang/jit/adapter/ -r -A 3Repository: tile-ai/tilelang
Length of output: 2051
🏁 Script executed:
# Look for Hexagon-specific code generation
find tilelang -path "*hexagon*" -type f | head -20Repository: tile-ai/tilelang
Length of output: 132
🏁 Script executed:
# Check what kind of source TVM Hexagon backend produces
grep -n "llvm" tilelang/jit/adapter/wrapper.py -iRepository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Look at what happens in the Cython adapter for get_kernel_source
grep -n "def get_kernel_source" tilelang/jit/adapter/ -r -A 15Repository: tile-ai/tilelang
Length of output: 5063
🏁 Script executed:
# Check what code format TVM's Hexagon target produces
grep -n "hexagon" tilelang/jit/adapter/tvm_ffi.py -i -B 3 -A 3Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Look at the cython adapter to understand the flow for Hexagon
sed -n '100,135p' tilelang/jit/adapter/cython/adapter.pyRepository: tile-ai/tilelang
Length of output: 1623
🏁 Script executed:
# Check TVM target creation for Hexagon
grep -r "Target\|target" tilelang/jit/adapter/cython/adapter.py | grep -i hexagonRepository: tile-ai/tilelang
Length of output: 240
🏁 Script executed:
# Look for what TVM codegen produces - check if there's any indication of LLVM vs C
grep -n "device_kernel_source" tilelang/jit/adapter/cython/adapter.py | head -10Repository: tile-ai/tilelang
Length of output: 656
🏁 Script executed:
# Let's check if Hexagon can produce LLVM IR by examining TVM integration
grep -n "ir_module\|IRModule" tilelang/jit/adapter/cython/adapter.py -B 2 -A 2 | head -40Repository: tile-ai/tilelang
Length of output: 1633
🏁 Script executed:
# Check what format TVM produces for different targets
# Look for codegen or backend specifications
find tilelang -name "*.py" | xargs grep -l "hexagon" | head -5Repository: tile-ai/tilelang
Length of output: 219
🏁 Script executed:
# Check the hexagon arch file to understand what source it processes
sed -n '1,50p' tilelang/carver/arch/hexagon.pyRepository: tile-ai/tilelang
Length of output: 2022
🏁 Script executed:
# Look for test cases or examples that use Hexagon to see what source format is expected
find . -path "*/test*" -name "*.py" | xargs grep -l "hexagon" 2>/dev/null | head -3Repository: tile-ai/tilelang
Length of output: 142
🏁 Script executed:
# Check what code is generated by looking at a simple example or test
# Look for inspect_source usage with Hexagon
grep -r "inspect_source\|get_device_source" tilelang/jit/ -B 3 -A 3 | head -40Repository: tile-ai/tilelang
Length of output: 2068
🏁 Script executed:
# The key insight: Let's check if there's any LLVM IR handling elsewhere in CPU wrapper
grep -n "LLVM\|llvm\|define" tilelang/jit/adapter/wrapper.pyRepository: tile-ai/tilelang
Length of output: 91
🏁 Script executed:
# Check test cases to see what format Hexagon source is expected to be
sed -n '1,100p' ./testing/python/hexagon/test_hmx_mma.pyRepository: tile-ai/tilelang
Length of output: 3388
🏁 Script executed:
# Check what TVM produces for Hexagon - look at lower.py
grep -n "hexagon\|llvm" tilelang/engine/lower.py -i -B 2 -A 2 | head -50Repository: tile-ai/tilelang
Length of output: 1992
🏁 Script executed:
# The critical test: what does inspect_source() return for a Hexagon target?
# Let's check if there's documentation or examples
grep -r "inspect_source" tilelang/ --include="*.py" -B 1 -A 1 | grep -i "hexagon\|llvm" -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Let's check what TVM's Hexagon target backend is set to
grep -n "hexagon\|llvm" tilelang/utils/target.py -i -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 2237
🏁 Script executed:
# Confirm that split(";") will fail on LLVM IR
python3 << 'EOF'
# Example LLVM function
llvm_func = '''define i32 `@kernel`(i32* %arg0, i32 %arg1) {
%1 = load i32, i32* %arg0
%2 = add i32 %1, %arg1
ret i32 %2
}'''
# What happens with split(";")[0]
declaration = llvm_func.split(";")[0]
print("LLVM function:")
print(llvm_func)
print("\nResult of split(';')[0]:")
print(declaration)
print("\nThis includes the entire function body, not just the declaration!")
# Example C function
c_func = '''int32_t kernel(int32_t* arg0, int32_t arg1);
{
// body
}'''
declaration_c = c_func.split(";")[0]
print("\n\nC function:")
print(c_func)
print("\nResult of split(';')[0]:")
print(declaration_c)
print("\nThis correctly stops at the declaration!")
EOFRepository: tile-ai/tilelang
Length of output: 577
TLCPUSourceWrapper parses C prototypes, not LLVM IR.
This branch routes Hexagon (which generates LLVM IR, not C code) through TLCPUSourceWrapper. However, create_call_func() still extracts declarations with split(";")[0], which fails for LLVM IR—LLVM uses braces, not semicolons. This causes the parser to include the entire function body in the "declaration", and argument extraction will then match locals instead of parameters, producing malformed function calls. Update the CPU wrapper to parse LLVM IR before routing Hexagon here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tilelang/jit/adapter/wrapper.py` around lines 988 - 992, The Hexagon target
is being routed to TLCPUSourceWrapper which expects C prototypes, but Hexagon
emits LLVM IR; update the target routing so Hexagon is parsed as IR instead of
C: change the branch that checks is_hexagon_target(self.target) (in the wrapper
selection near TLCPUSourceWrapper) to route Hexagon to an IR-aware wrapper
(e.g., a new or existing TLCPUIRWrapper) or enhance TLCPUSourceWrapper to detect
LLVM IR and parse declarations accordingly; also update create_call_func() so
when handling IR it extracts the function signature by splitting/locating the
opening brace '{' (or using an IR-specific parser) rather than split(";")[0],
ensuring argument extraction reads parameters not local IR instructions.
| if self.torch_function is None: | ||
| from tilelang.utils.target import is_hexagon_target | ||
|
|
||
| if is_hexagon_target(self.target): | ||
| raise RuntimeError( | ||
| "Hexagon kernels cannot be executed directly on the host machine. " | ||
| "To run this kernel, please use the Hexagon SDK Simulator or " | ||
| "the HexagonLauncher on a supported Qualcomm device." | ||
| ) | ||
|
|
||
| return self.torch_function(*args, **kwds) |
There was a problem hiding this comment.
Raise for all uninitialized kernels, not just Hexagon.
On non-Hexagon targets this still falls through to self.torch_function(*args, **kwds), which becomes a raw 'NoneType' object is not callable'. Add the generic RuntimeError after the Hexagon-specific branch so callers always get a stable error here.
Proposed fix
if self.torch_function is None:
from tilelang.utils.target import is_hexagon_target
if is_hexagon_target(self.target):
raise RuntimeError(
"Hexagon kernels cannot be executed directly on the host machine. "
"To run this kernel, please use the Hexagon SDK Simulator or "
"the HexagonLauncher on a supported Qualcomm device."
)
+ raise RuntimeError("The compiled function is not initialized.")
return self.torch_function(*args, **kwds)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self.torch_function is None: | |
| from tilelang.utils.target import is_hexagon_target | |
| if is_hexagon_target(self.target): | |
| raise RuntimeError( | |
| "Hexagon kernels cannot be executed directly on the host machine. " | |
| "To run this kernel, please use the Hexagon SDK Simulator or " | |
| "the HexagonLauncher on a supported Qualcomm device." | |
| ) | |
| return self.torch_function(*args, **kwds) | |
| if self.torch_function is None: | |
| from tilelang.utils.target import is_hexagon_target | |
| if is_hexagon_target(self.target): | |
| raise RuntimeError( | |
| "Hexagon kernels cannot be executed directly on the host machine. " | |
| "To run this kernel, please use the Hexagon SDK Simulator or " | |
| "the HexagonLauncher on a supported Qualcomm device." | |
| ) | |
| raise RuntimeError("The compiled function is not initialized.") | |
| return self.torch_function(*args, **kwds) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tilelang/jit/kernel.py` around lines 206 - 216, The current call path may
attempt to call self.torch_function when it is None; update the kernel
invocation in the method containing self.torch_function to raise a clear
RuntimeError for uninitialized kernels on all targets: keep the existing
Hexagon-specific RuntimeError (using is_hexagon_target(self.target)) and then
add a generic RuntimeError if self.torch_function is still None that explains
the kernel is uninitialized and cannot be executed on the host (mentioning to
use the Hexagon SDK Simulator/HexagonLauncher only in the Hexagon message). This
ensures callers receive a stable, descriptive error instead of a 'NoneType'
object is not callable' when invoking self.torch_function(*args, **kwds).
| # Handle Backend-Specific Normalization (Shorthands) | ||
| if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var: | ||
| return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73" |
There was a problem hiding this comment.
Preserve explicit -mcpu and other flags when normalizing Hexagon strings.
This currently rewrites any string containing "hexagon" to a fixed llvm -mtriple=hexagon -mcpu=hexagonv73. Inputs like "llvm -mcpu=hexagonv75" lose the requested CPU/features and silently downgrade codegen.
Suggested fix
# Handle Backend-Specific Normalization (Shorthands)
if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:
- return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"
+ normalized = return_var.strip()
+ if normalized == "hexagon":
+ return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"
+ elif normalized.startswith("llvm"):
+ return_var = f"{normalized} -mtriple=hexagon"
+ else:
+ return_var = f"llvm -mtriple=hexagon {normalized}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Handle Backend-Specific Normalization (Shorthands) | |
| if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var: | |
| return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73" | |
| # Handle Backend-Specific Normalization (Shorthands) | |
| if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var: | |
| normalized = return_var.strip() | |
| if normalized == "hexagon": | |
| return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73" | |
| elif normalized.startswith("llvm"): | |
| return_var = f"{normalized} -mtriple=hexagon" | |
| else: | |
| return_var = f"llvm -mtriple=hexagon {normalized}" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tilelang/utils/target.py` around lines 189 - 191, The normalization currently
overwrites any Hexagon target string (variable return_var) with a fixed "llvm
-mtriple=hexagon -mcpu=hexagonv73"; instead, update the logic in
tilelang.utils.target (the block handling Hexagon shorthands) to parse the
existing return_var string and only inject missing flags: if "-mtriple" is
absent append "-mtriple=hexagon", if "-mcpu" is absent append a default
"-mcpu=hexagonv73", and preserve any existing "-mcpu", "-mattr", or other flags
already present in return_var; ensure you handle cases where return_var is
exactly "hexagon" (replace with "llvm" plus the necessary flags) versus when it
already starts with "llvm" (modify by appending missing flags) so no explicit
user flags are overwritten.
|
@Calaweh Thanks for your contributions! We're currently decoupling different backends, and after those works are wrapped up, I will take a look for this pr. |
…lowering pipeline
…_LLVM restore, tirx migration, target dict form)
There was a problem hiding this comment.
🧹 Nitpick comments (1)
CMakeLists.txt (1)
339-347: 💤 Low valueMinor:
USE_LLVMstring comparison is case-sensitive.The
USE_LLVM STREQUAL "ON"check at line 339 only matches the uppercase literal. If a user passes-DUSE_LLVM=onor-DUSE_LLVM=On, the block is skipped andllvm-configis never located, leavingUSE_LLVMas the raw string"on"which TVM may mishandle.Additionally, the PR description states "Requires LLVM 17 or 18", but the generic
llvm-configfallback could find LLVM 16 or 19 without any version validation.🔧 Optional fix for case-insensitive matching and version check
-if(USE_LLVM STREQUAL "ON") +string(TOUPPER "${USE_LLVM}" _USE_LLVM_UPPER) +if(_USE_LLVM_UPPER STREQUAL "ON") find_program(LLVM_CONFIG_PATH NAMES llvm-config-18 llvm-config-17 llvm-config) if(LLVM_CONFIG_PATH) + execute_process( + COMMAND "${LLVM_CONFIG_PATH}" --version + OUTPUT_VARIABLE _LLVM_VERSION + OUTPUT_STRIP_TRAILING_WHITESPACE + ) + if(NOT _LLVM_VERSION MATCHES "^(17|18)\\.") + message(WARNING "Found LLVM ${_LLVM_VERSION}; Hexagon support requires LLVM 17 or 18.") + endif() set(USE_LLVM ${LLVM_CONFIG_PATH} CACHE STRING "Path to llvm-config" FORCE)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CMakeLists.txt` around lines 339 - 347, The current check if(USE_LLVM STREQUAL "ON") is case-sensitive and can miss values like "on"/"On"; change the condition to accept any common truthy forms (e.g., check if(USE_LLVM) or explicitly compare against "ON"/"On"/"on") so the block always runs when the user enables LLVM, then after finding LLVM_CONFIG_PATH run llvm-config --version (via execute_process) and parse the major version number returned by llvm-config to ensure it is 17 or 18 (if not, message(FATAL_ERROR) with a clear message); keep the existing behavior of setting USE_LLVM to the found llvm-config path (set(USE_LLVM ${LLVM_CONFIG_PATH} CACHE STRING ... FORCE)) when validation passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@CMakeLists.txt`:
- Around line 339-347: The current check if(USE_LLVM STREQUAL "ON") is
case-sensitive and can miss values like "on"/"On"; change the condition to
accept any common truthy forms (e.g., check if(USE_LLVM) or explicitly compare
against "ON"/"On"/"on") so the block always runs when the user enables LLVM,
then after finding LLVM_CONFIG_PATH run llvm-config --version (via
execute_process) and parse the major version number returned by llvm-config to
ensure it is 17 or 18 (if not, message(FATAL_ERROR) with a clear message); keep
the existing behavior of setting USE_LLVM to the found llvm-config path
(set(USE_LLVM ${LLVM_CONFIG_PATH} CACHE STRING ... FORCE)) when validation
passes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 82a9b39e-db6f-4bef-a5de-c27986690384
📒 Files selected for processing (3)
CMakeLists.txtsrc/runtime/hexagon_runtime.ccsrc/transform/lower_hexagon_intrinsics.cc
💤 Files with no reviewable changes (2)
- src/transform/lower_hexagon_intrinsics.cc
- src/runtime/hexagon_runtime.cc
|
@Calaweh hello,Does your PR support the v68 architecture? |
|
@q55180514 Yes, v68 is supported. I verified this by compiling the HMX matmul kernel with
The |
Title: Add Hexagon Backend with HMX Support for Matrix Multiplication
Summary
Introduce a dedicated Hexagon backend for TileLang with support for Qualcomm HMX (Hexagon Matrix eXtensions). This integration enables the generation of high-performance LLVM IR specifically targeting Hexagon DSP hardware instructions for matrix multiplication.
Why
Qualcomm's Hexagon DSPs power a vast number of edge devices. By leveraging the Hexagon Kernel Library (HexKL) interfaces through TileLang, we can achieve efficient CodeGen for specialized operators like sparse attention on mobile hardware. This keeps TileLang at the forefront of heterogeneous hardware support and enables localized, backend-specific optimizations for Qualcomm chips.
Key Changes
Hardware Lowering
LowerHexagonIntrinsicsthat translates TileLang MMA placeholders into the specialized@HexKL_mma_i8acc32hardware instructions.Memory Architecture
global.vtcmandglobal.hmx.accmemory scopes.Runtime Support
hmx_kernel_launchutilizing theHexagonHtpRAII guard to enable the HMX hardware block during execution.JIT Infrastructure
Validation
Additional Checks
hexagon), memory allocations (A_vtcm,C_acc), and hardware intrinsic calls (HexKL_mma_i8acc32).RuntimeError.Compatibility
Related Issues
Closes: #1293
Summary by CodeRabbit
New Features
Tests
Chores