[DEV] feat: Enable GPU Marlin repack in InfiniLM AWQ/GPTQ weight processing

## Background

InfiniLM currently needs Marlin-compatible weight layouts for AWQ/GPTQ inference. CPU-side repack works but increases model loading cost and does not use the new InfiniCore GPU repack operators.

## Scope

- Use InfiniCore AWQ/GPTQ Marlin repack operators when available.
- Keep fallback behavior for environments without Marlin support.
- Integrate the repacked weights into the existing quantized linear process-weight flow.

## Follow-up Work

- Improve workspace/cache reuse.
- Optimize server integration.
- Reduce communication overhead for tensor parallel inference.
- Continue profiling decode performance against vLLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DEV] feat: Enable GPU Marlin repack in InfiniLM AWQ/GPTQ weight processing #458

Background

Scope

Follow-up Work

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[DEV] feat: Enable GPU Marlin repack in InfiniLM AWQ/GPTQ weight processing #458

Description

Background

Scope

Follow-up Work

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions