Hello, I am trying to run an LLM on the jetson thor.
However, the output to the prompt "What is the capital of the United States?" is garbage.
Do you have any clue of what is possibly happaning?
(venv) user@user:~/workspace/TensorRT-Edge-LLM$ python - <<'PY'
from experimental.server import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-0.6B")
outputs = llm.generate(
["What is the capital of the United States?"],
SamplingParams(max_tokens=128),
)
print(outputs[0].text)
PY
Downloading (incomplete total...): 0.00B [00:00, ?B/s] Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Downloading (incomplete total...): 2%|▎ | 32.6M/1.52G [00:19<05:06, 4.86MB/s]
Fetching 10 files: 10%|███▌ | 1/10 [00:00<00:01, 6.89it/s]
Fetching 10 files: 100%|███████████████████████████████████| 10/10 [01:17<00:00, 7.80s/it]
Download complete: 100%|██████████████████████████████| 1.52G/1.52G [01:18<00:00, 19.4MB/s]
[torch.onnx] Obtain model graph for `_Wrapper([...]` with `torch.export.export(..., strict=False)`...
/usr/lib/python3.12/contextlib.py:144: UserWarning: The tensor attribute self._model.model.last_pre_norm_hidden_states was assigned during export. Such attributes must be registered as buffers using the `register_buffer` API (https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_buffer).
next(self.gen)
[torch.onnx] Obtain model graph for `_Wrapper([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decompositions...
/usr/lib/python3.12/copyreg.py:99: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
return cls.__new__(cls, *args)
[torch.onnx] Run decompositions... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅
[torch.onnx] Optimize the ONNX graph...
[torch.onnx] Optimize the ONNX graph... ✅
/home/adas/workspace/TensorRT-Edge-LLM/venv/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_onnx_program.py:486: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
rename_mapping = _dynamic_shapes.create_rename_mapping(
/home/adas/workspace/TensorRT-Edge-LLM/venv/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_onnx_program.py:486: UserWarning: # The axis name: past_len will not be used, since it shares the same shape constraints with another axis: past_len.
rename_mapping = _dynamic_shapes.create_rename_mapping(
[13:47:37.747] [INFO] [llmBuilder.cpp:98:build] Using __LUNOWUD=-peep:match_dual_gemm=off
[13:47:37.747] [INFO] [trtUtils.h:67:loadEdgellmPluginLib] EDGELLM_PLUGIN_PATH variable is not set. Default to build/libNvInfer_edgellm_plugin.so
[13:47:37.865] [INFO] [TensorRT] [MemUsageChange] Init CUDA: CPU -17, GPU +0, now: CPU 1291, GPU 59946 (MiB)
[13:47:39.094] [INFO] [TensorRT] [MemUsageChange] Init builder kernel library: CPU +1227, GPU +1224, now: CPU 2640, GPU 61328 (MiB)
[13:47:39.094] [INFO] [llmBuilder.cpp:128:build] Parsing ONNX model: /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/model.onnx
[13:47:39.100] [INFO] [TensorRT] ----------------------------------------------------------------
[13:47:39.100] [INFO] [TensorRT] Input filename: /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/model.onnx
[13:47:39.101] [INFO] [TensorRT] ONNX IR version: 0.0.10
[13:47:39.101] [INFO] [TensorRT] Opset version: 24
[13:47:39.101] [INFO] [TensorRT] Producer name: pytorch
[13:47:39.101] [INFO] [TensorRT] Producer version: 2.12.0+cu130
[13:47:39.101] [INFO] [TensorRT] Domain:
[13:47:39.101] [INFO] [TensorRT] Model version: 0
[13:47:39.101] [INFO] [TensorRT] Doc string:
[13:47:39.101] [INFO] [TensorRT] ----------------------------------------------------------------
[13:47:39.101] [WARNING] [TensorRT] ModelImporter.cpp:653: Make sure input last_token_ids has Int64 binding.
[13:47:39.104] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.120] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.121] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.121] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.122] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.122] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.124] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.124] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.125] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.125] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.126] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.126] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.127] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.127] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.128] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.128] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.129] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.129] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.130] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.130] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.131] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.131] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.133] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.133] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.134] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.134] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.135] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.135] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.136] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.136] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.137] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.137] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.138] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.138] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.140] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.140] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.141] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.141] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.142] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.142] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.143] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.143] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.144] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.144] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.145] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.145] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.146] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.146] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.147] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.148] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.149] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.149] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.150] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.150] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.152] [INFO] [TensorRT] Searching for plugin: AttentionPlugin, plugin_version: 1, plugin_namespace:
[13:47:39.152] [INFO] [TensorRT] Successfully created plugin: AttentionPlugin
[13:47:39.268] [INFO] [TensorRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[13:47:39.357] [INFO] [TensorRT] Compiler backend is used during engine build.
[13:48:10.366] [INFO] [TensorRT] Detected 33 inputs and 29 output network tensors.
[13:48:10.689] [INFO] [TensorRT] Total Host Persistent Memory: 80 bytes
[13:48:10.689] [INFO] [TensorRT] Total Device Persistent Memory: 0 bytes
[13:48:10.689] [INFO] [TensorRT] Max Scratch Memory: 117441024 bytes
[13:48:10.689] [INFO] [TensorRT] [BlockAssignment] Started assigning block shifts. This will take 1 steps to complete.
[13:48:10.689] [INFO] [TensorRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.01038ms to assign 1 blocks to 1 nodes requiring 117441024 bytes.
[13:48:10.689] [INFO] [TensorRT] Total Activation Memory: 117441024 bytes
[13:48:26.713] [INFO] [TensorRT] Detected 33 inputs and 29 output network tensors.
[13:48:26.969] [INFO] [TensorRT] Total Host Persistent Memory: 80 bytes
[13:48:26.969] [INFO] [TensorRT] Total Device Persistent Memory: 0 bytes
[13:48:26.969] [INFO] [TensorRT] Max Scratch Memory: 33575424 bytes
[13:48:26.969] [INFO] [TensorRT] [BlockAssignment] Started assigning block shifts. This will take 1 steps to complete.
[13:48:26.969] [INFO] [TensorRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.00837ms to assign 1 blocks to 1 nodes requiring 33575424 bytes.
[13:48:26.969] [INFO] [TensorRT] Total Activation Memory: 33575424 bytes
[13:48:27.043] [INFO] [TensorRT] Total Weights Memory: 1192100096 bytes
[13:48:27.049] [INFO] [TensorRT] Compiler backend is used during engine execution.
[13:48:27.049] [INFO] [TensorRT] Engine generation completed in 47.7824 seconds.
[13:48:27.050] [INFO] [TensorRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1136 MiB
[13:48:27.210] [INFO] [TensorRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 13828 MiB
[13:48:27.826] [INFO] [builderUtils.cpp:328:buildAndSerializeEngine] Engine saved to /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/llm.engine
[13:48:27.829] [INFO] [llmBuilder.cpp:944:copyConfig] Copied config.json with builder config to /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/config.json
[13:48:27.829] [INFO] [fileUtils.cpp:48:copyFile] Successfully copied /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/tokenizer_config.json to /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/tokenizer_config.json
[13:48:27.829] [INFO] [llmBuilder.cpp:975:copyTokenizerFiles] Copied tokenizer file: tokenizer_config.json
[13:48:27.834] [INFO] [fileUtils.cpp:48:copyFile] Successfully copied /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/tokenizer.json to /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/tokenizer.json
[13:48:27.834] [INFO] [llmBuilder.cpp:975:copyTokenizerFiles] Copied tokenizer file: tokenizer.json
[13:48:27.834] [INFO] [fileUtils.cpp:48:copyFile] Successfully copied /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/processed_chat_template.json to /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/processed_chat_template.json
[13:48:27.834] [INFO] [llmBuilder.cpp:975:copyTokenizerFiles] Copied tokenizer file: processed_chat_template.json
[13:48:27.973] [INFO] [fileUtils.cpp:48:copyFile] Successfully copied /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/embedding.safetensors to /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/embedding.safetensors
[13:48:27.973] [INFO] [llmBuilder.cpp:1129:copyEmbeddingFile] Copied embedding.safetensors to /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/embedding.safetensors
[13:48:29.281] [INFO] [trtUtils.h:67:loadEdgellmPluginLib] EDGELLM_PLUGIN_PATH variable is not set. Default to build/libNvInfer_edgellm_plugin.so
[13:48:29.402] [INFO] [llmRuntimeUtils.cpp:444:loadEmbeddingTable] Loaded FP16 embedding: [151936, 1024]
[13:48:29.402] [INFO] [llmEngineConfig.cpp:229:parseEngineConfig] reading /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/config.json
[13:48:29.404] [INFO] [llmRuntimeUtils.cpp:167:collectRopeConfig] Collected rope config: RopeConfig: type: Default rotaryScale: 1 rotaryTheta: 1e+06 maxPositionEmbeddings: 40960
[13:48:29.404] [INFO] [llmEngineConfig.cpp:331:parseEngineConfig] LLMEngineConfig{ hiddenSize=1024 vocabSize=151936 outputVocabSize=151936 numDecoderLayers=28 numAttentionLayers=28 numKVHeads=8 headDim=128 rotaryDim=128 maxBatch=1 maxInputLen=4096 maxKVCapacity=8192 useTrtNativeOps=false isSpecDecodeBase=false specDecodeType=0 loraRank=0 }
[13:48:29.407] [INFO] [engineExecutor.cpp:36:EngineExecutor] loading engine file: /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/llm.engine
[13:48:29.407] [INFO] [TensorRT] Loaded engine size: 1140 MiB
[13:48:29.538] [INFO] [TensorRT] [MS] Running engine with multi stream info
[13:48:29.538] [INFO] [TensorRT] [MS] Number of aux streams is 1
[13:48:29.538] [INFO] [TensorRT] [MS] Number of total worker streams is 2
[13:48:29.538] [INFO] [TensorRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[13:48:29.626] [INFO] [TensorRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 1136 (MiB)
[13:48:29.626] [INFO] [engineExecutor.cpp:53:EngineExecutor] engine loaded successfully (62 I/O tensors)
[13:48:29.731] [INFO] [llmInferenceRuntime.cpp:118:initializeCommon] Base EngineExecutor successfully loaded from /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/llm.engine.
[13:48:29.731] [INFO] [llmInferenceRuntime.cpp:129:initializeCommon] Runtime batch size set to: 1 (from engine bundle)
[13:48:29.742] [INFO] [ropeCache.cpp:103:getOrCreate] RopeCache: creating new entry (rotaryDim=128, maxSeqLen=8192)
[13:48:29.753] [INFO] [llmInferenceRuntime.cpp:244:initializeCommon] Runtime tensors successfully allocated.
[13:48:29.753] [INFO] [llmInferenceRuntime.cpp:272:initializeCommon] Start loading tokenizer from model directory: /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm
[13:48:31.837] [INFO] [tokenizer.cpp:385:loadVocabulary] Loaded 151643 vocabulary tokens
[13:48:32.044] [INFO] [tokenizer.cpp:96:loadFromHF] Loaded 26 special tokens
[13:48:32.163] [INFO] [tokenizer.cpp:782:loadChatTemplate] Successfully loaded chat template from /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm/processed_chat_template.json (for model: /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca)
[13:48:32.163] [INFO] [tokenizer.cpp:123:loadFromHF] Successfully loaded tokenizer from /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm (vocab_size=151669)
[13:48:32.191] [INFO] [llmInferenceRuntime.cpp:274:initializeCommon] Tokenizer successfully loaded from model directory: /home/adas/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca/.edgellm/onnx/llm/.edgellm/engine/i4096_b1_kv8192/llm
[13:48:32.193] [INFO] [llmInferenceRuntime.cpp:372:initializeCommon] Setup shared execution context memory: 117441024 bytes (base requires: 117441024, strategy requires: 0, vision requires: 0, audio requires: 0, action requires: 0)
[13:48:32.194] [INFO] [TensorRT] Switching optimization profile from: 0 to 1. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[13:48:32.294] [INFO] [engineExecutor.cpp:219:captureGraph] captured graph (hash=0xd5d500c02b15a11f)
[13:48:32.294] [INFO] [decoderRegistry.cpp:84:captureCudaGraphs] Successfully captured decoding CUDA graphs for active decoding strategies.
[13:48:32.295] [INFO] [TensorRT] Switching optimization profile from: 1 to 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[13:48:32.331] [INFO] [TensorRT] Switching optimization profile from: 0 to 1. Please ensure there are no enqueued operations pending in this context prior to switching profiles
lec
1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
mkdir -p build
cd build
cmake .. \
-DTRT_PACKAGE_DIR=/usr \
-DCUDA_CTK_VERSION=13.0 \
-DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
-DEMBEDDED_TARGET=jetson-thor \
-DENABLE_CUTE_DSL=ALL \
-DBUILD_PYTHON_BINDINGS=ON
make -j$(nproc)
cd ..
Describe the bug
Hello, I am trying to run an LLM on the jetson thor.
I was able to make a successful inference of Qwen3 0.6B using the high level api as described in the guide https://nvidia.github.io/TensorRT-Edge-LLM/latest/user_guide/getting_started/quick-start-guide.html#quick-start-guide.
However, the output to the prompt "What is the capital of the United States?" is garbage.
Do you have any clue of what is possibly happaning?
See the input command and the related terminal output below:
Steps/Code to reproduce bug
Build configuration:
Runtime command used:
Expected behavior
i would expect correct text answer, but the actual output is
System information Edge Device
KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
INSTALL_TYPE=
======================================================================