I tried doing inference usign InspireMusic-Base and everything works, but when I try using the InspireMusic-1.5B model it fails giving me this error: ``` Exception in thread Thread-8 (llm_job): Traceback (most recent call last): File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() File "/usr/lib/python3.11/threading.py", line 982, in run self._target(*self._args, **self._kwargs) File "/content/InspireMusic/inspiremusic/cli/model.py", line 148, in llm_job for i in self.llm.inference(**inference_kwargs): File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 57, in generator_context response = gen.send(request) ^^^^^^^^^^^^^^^^^ File "/content/InspireMusic/inspiremusic/llm/llm.py", line 374, in inference top_ids = self.sampling_ids(logp, out_tokens, ignore_eos=i < min_len).item() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. The list of tensors is empty for UUID: 3e0648bc-3981-11f0-b707-0242ac1c000c ``` Do you have any idea of why this could be happening? Thanks for your work on the repo! P.S. I don't think it's related, but I updated qwen_encoder.py in order to use attn_implementation="eager" instead of attn_implementation="flash_attention_2". (just switched to eager as default).