adding-model-support

General↓ 0 installsUpdated 64d ago
VerifiedCuratedNVIDIA
Guide for adding support for new LLM or VLM models in Megatron-Bridge. Covers bridge, provider, recipe, tests, docs, and examples.
SKILL.md preview

---
name: adding-model-support
description: Guide for adding support for new LLM or VLM models in Megatron-Bridge. Covers bridge, provider, recipe, tests, docs, and examples.
when_to_use: User asks to add, onboard, or integrate a new model family; 'add Qwen4 support', 'onboard Llama 5', 'create a bridge for X', 'write a recipe for Y'.
---

# Adding New Model Support in Megatron-Bridge

## Phase 1: Discovery

### Step 1 — Get the HF model link

Ask the user for the HuggingFace model link (e.g. `https://huggingface.co/Qwen/Qwen3.5-VL-27B`).

If the model is **not public**, ask the user to provide the `config.json` file directly.

### Step 2 — Fetch and analyze config.json

Read the model's `config.json` from HuggingFace (or from the user-provided file). Key fields to extract:

- `model_type` — used for `@register_bridge(model_type=...)`
- `architectures` — the HF model class name (used for `source=...` in registration)
- `tie_word_embeddings` — critical for weight tying
- Architecture fields: `num_hidden_layers`, `hidden_size`, `intermediate_size`, `num_attention_heads`, `num_key_value_heads`, `vocab_size`, `max_position_embeddings`, `rope_theta`, etc.
- MoE fields (if present): `num_local_experts`, `num_experts_per_tok`, `moe_intermediate_size`
- MLA fields (if present): `q_lora_rank`, `kv_lora_rank`, `qk_nope_head_dim`, `qk_rope_head_dim`

If there are config fields you don't recognize from previously supported models (check `CONFIG_MAPPING` in `model_bridge.py` and existing bridges), this likely indicates a **new architectural block** (e.g., a novel attention variant, custom normalization, or a new layer type). Ask the user to provide the HuggingFace `modeling_*.py` implementation of that block so you can understand the computation and create the correct Megatron-side mapping or custom module.

### Step 3 — Determine VLM vs LLM

**VLM** (Vision-Language Model) if config.json contains:
- `text_config` AND `vision_config` sub-configs
- Note: VLMs may or may not have "VL" in the name

**LLM** (Text-only) if:
- No `text_config` / `vision_config`
- Single flat config for the language model

This distinction affects:
- Which files to create (VLMs need a model.py combining vision + language)
- Where to read config fields from (`text_config` vs top-level for VLMs)
- Test patterns (VLMs need vision inputs in functional tests)

### Step 4 — Check for quantized weights (FP8 / FP4)

Inspect the HF checkpoint's `model.safetensors` (or `model.safetensors.index.json`) for quantized
weight dtypes such as `float8_e4m3fn` (FP8) or `uint8`/`uint4` with accompanying `*_scale_inv` or
`*_scale` tensors. Common signs:

- `config.json` mentions `quantization_config` or dtype fields like `"torch_dtype": "float8_e4m3fn"`
- Safetensors contain `weight_scale_inv` keys alongside the main weight keys
- The model card mentions FP8/FP4/INT4 weights

**Why this matters:** The bridge's `import_ckpt` path does **not** automatically dequantize — it
loads raw quantized values as-is. This produces a silently broken model (random-level loss, huge
grad norms) instead of raising an error.

**Fix:** Dequantize before conversion. Two approaches:

1. **Standalone script** (recommended for user-facing models) — Write a
   `dequant_fp8_for_bridge.py` in the model's examples folder.
   Reference: `examples/models/ministral/ministral3/dequant_fp8_for_bridge.py`.
   The pattern is: `w_bf16 = fp8_weight.to(bfloat16) * weight_scale_inv`.

2. **In-bridge hook** — Override `maybe_modify_loaded_hf_weight()` in the bridge class to
   dequantize on the fly during import:

   ```python
   def maybe_modify_loaded_hf_weight(self, hf_param, hf_state_dict):
       weight = hf_state_dict[hf_param]
       scale_key = hf_param + "_scale_inv"
       if weight.dtype == torch.float8_e4m3fn and scale_key in hf_state_dict:
           return weight.to(torch.bfloat16) * hf_state_dict[scale_key].to(torch.bfloat16)
       return weight
   ```

Always add a sanity check in the verification workflow

…