live1,247 agents deployedbuilt by a solo devpowered by hermes
← All skillsSign up to install

parity-testing

General0 installsUpdated 19d ago
VerifiedCuratedNVIDIA

Structured framework for verifying numerical parity of HF<->MCore weight conversions. References existing tools and the add-model-support skill.

SKILL.md preview

---
name: parity-testing
description: Structured framework for verifying numerical parity of HF<->MCore weight conversions. References existing tools and the add-model-support skill.
when_to_use: Debugging weight mismatches, verifying HF↔MCore checkpoint round-trips, choosing verification tools, or investigating a commit that changed weight conversion and caused parity failures; 'weights don't match', 'parity test', 'roundtrip check', 'logit equivalence'.
---

# Parity Testing for Megatron Bridge

This skill provides the decision framework for choosing the right
verification tool and interpreting results. For the full model onboarding
workflow (which includes parity testing as milestones 1 and 2), see the
`add-model-support` skill.

## Quick Decision: Which Tool to Run

| What you want to verify | Tool | GPU? | When to use |
|---|---|---|---|
| All weights round-trip exactly (single GPU) | `hf_megatron_roundtrip.py` | No | First check after writing a bridge |
| Weights round-trip with TP/PP/EP | `hf_megatron_roundtrip_multi_gpu.py` | Yes | After single-GPU passes |
| Forward-pass logit equivalence | `compare_hf_and_megatron/compare.py` | Yes | After round-trip passes |
| Text generation sanity | `hf_to_megatron_generate_text.py` | Yes | Large models that OOM compare.py |
| Programmatic weight check | `weights_verification_table()` | Yes | Inside Python scripts |
| VLM generation sanity | `hf_to_megatron_generate_vlm.py` | Yes | VLM models |

All tools live under `examples/conversion/`.

## 3-Level Test Strategy

### Level 1: State Dict Round-Trip (exact match)

The fastest and most fundamental check. If mappings can't perfectly
round-trip weights, nothing else will work.

```bash
# Single-GPU round-trip
uv run python examples/conversion/hf_megatron_roundtrip.py \
    --hf-model-id <org>/<model>

# Multi-GPU with TP=2
uv run python -m torch.distributed.run --nproc_per_node=2 \
    examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
    --hf-model-id <org>/<model> --tp 2

# Multi-GPU with PP=2
uv run python -m torch.distributed.run --nproc_per_node=2 \
    examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
    --hf-model-id <org>/<model> --pp 2
```

**Expected:** Every weight shows "Matches Original: checkmark". Any "X"
means the param mapping has an error.

**Tolerance:** Exact match (`max_diff == 0.0`). Round-trip conversions are
pure tensor reshaping — no floating-point arithmetic is involved.

For programmatic verification inside scripts, use the built-in verifier:

```python
from megatron.bridge.models.conversion.utils import weights_verification_table
weights_verification_table(bridge, hf_pretrained, megatron_model)
```

### Level 2: Forward-Pass Parity (GPU / bfloat16)

After round-trip passes, verify that converted weights produce identical
forward-pass output.

```bash
# Compare logits (loads both HF and Megatron models)
uv run python -m torch.distributed.run --nproc_per_node=2 \
    examples/conversion/compare_hf_and_megatron/compare.py \
    --hf_model_path <org>/<model> --tp 2 \
    --prompt "The capital of France is"
```

**Expected:** Cosine similarity > 99.99%, matching next-token predictions.

For large models that OOM `compare.py` (which loads both models), use text
generation instead:

```bash
uv run python -m torch.distributed.run --nproc_per_node=2 \
    examples/conversion/hf_to_megatron_generate_text.py \
    --hf_model_path <org>/<model> --tp 2 \
    --prompt "The capital of France is" --max_new_tokens 50
```

### Level 3: Training Parity (optional)

Verify that a few training steps produce decreasing loss. This catches
gradient computation issues that forward-pass tests miss. Use a toy model
with 2 layers and small dimensions. See the functional test pattern in the
`add-model-support` skill (Milestone 3, Phase 6).

## Tolerance Table

| Test Level | Dtype | Device | Max Diff | Cosine Sim |
|---|---|---|---|---|
| Round-trip | float32 | CPU | 0.0 (exact) | 1.0 (exact) |
| Forward pass | bfloat16