trtllm-moe-develop
General↓ 0 installsUpdated 19d ago
>-
SKILL.md preview
--- name: trtllm-moe-develop description: >- Review, design, and refactor TensorRT-LLM PyTorch MoE code for architecture fit, clean code, maintainability, and testability. Always use for any modification, review, refactor, or design planning that touches MoE modules, including tensorrt_llm/_torch/modules/fused_moe, ConfigurableMoE, MoE backends, MoEScheduler/moe_scheduler.py, forward execution/chunking, communication strategies, EPLB, quantization/weight handling, routing, factories, MoE docs, or MoE tests. Also use when the user asks whether a MoE design follows the current architecture or whether a MoE refactor is reasonable. license: Apache-2.0 metadata: author: NVIDIA Corporation --- # TensorRT-LLM MoE Code Quality Use this skill to keep MoE changes aligned with the current TensorRT-LLM MoE architecture. Favor module roles, API boundaries, and testability over local style cleanup. ## Required Context Before proposing or editing MoE code, read: 1. `CODING_GUIDELINES.md` 2. `tensorrt_llm/_torch/modules/fused_moe/MOE_DEVELOPER_GUIDE.md` 3. The target files being changed 4. The relevant tests under `tests/unittest/_torch/modules/moe/` Also inspect these files when the area is relevant: - Forward execution/chunking: inspect `moe_scheduler.py`, `configurable_moe.py`, `interface.py`, backend `run_moe`/`quantize_input` paths, and communication code. - MegaMoE/fused communication: inspect `moe_scheduler.py`, `mega_moe/`, `configurable_moe.py`, `quantization.py`, and communication code. - Communication: `tensorrt_llm/_torch/modules/fused_moe/communication/base.py` and `communication_factory.py`. - Quantization and weights: `tensorrt_llm/_torch/modules/fused_moe/quantization.py`. - EPLB/load balancing: `interface.py`, `moe_load_balancer.py`, `quantization.py`, `moe_scheduler.py`, current forward-execution/chunking code, and `test_moe_module.py`. - Test matrix/helpers: `tests/unittest/_torch/modules/moe/moe_test_utils.py` and `quantize_utils.py` when adding backend, quantization, skip, or parameter coverage. For module-specific work, read `references/moe-canonical-code-examples.md` after the guide and load only the relevant section. Each design gate or review should cite at least one concrete code example with file:line evidence. ## Working With MOE_DEVELOPER_GUIDE.md Treat `MOE_DEVELOPER_GUIDE.md` as the in-repo source of truth for MoE architecture. Treat this skill as the agent workflow layer that tells Codex how to apply that source of truth while designing, editing, or reviewing code. Use the guide this way: - Start from the guide sections that match the requested change: Architecture, File Map, Backend Capability Matrix, execution-flow/EPLB constraints, Canonical Examples, and Anti-Patterns. - Use guide content to fill the design gate: owner boundary, main API, reference pattern, and test plan. - Do not duplicate fast-changing matrices or backend support tables in this skill; prefer the guide as the current reference. - If a code change adds a backend, quantization method, communication strategy, fused-communication behavior, EPLB behavior, or test convention, check whether the guide also needs an update. - If guide and code disagree, inspect code and tests, mention the mismatch, and either update the guide as part of the change or report it as follow-up. Guide-update checklist: - File map changed: update `File Map`. - Backend or quant support changed: update `Backend Capability Matrix`. - New backend/communication/forward-execution pattern: update `Canonical Examples`. - New forbidden pattern or ownership rule: update `Anti-Patterns`. - Test convention changed: update `Tests`. ## Core Principle Preserve these owner boundaries: - `ConfigurableMoE` is the assembler/orchestrator. It wires backend, communication, EPLB, weight lifecycle delegation, and shared wrapper bookkeeping. - Backends declare capabilities, run MoE computation, and own the MoE module's weigh …