split-pr
General↓ 0 installsUpdated 19d ago
Split a PR into multiple PRs to reduce the number of required CODEOWNERS reviewer groups.
SKILL.md preview
--- name: split-pr description: Split a PR into multiple PRs to reduce the number of required CODEOWNERS reviewer groups. when_to_use: User asks to split a PR, reduce reviewer groups, or break up a large PR; 'too many CODEOWNERS', 'split this PR', 'break up PR', 'reduce reviewers needed'. user_invocable: true argument: "<pr-url-or-number>" --- # Split PR by CODEOWNERS Groups Split a large pull request into multiple smaller PRs, where each PR touches the fewest possible CODEOWNERS reviewer groups. The goal is to reduce review burden: a PR that only touches `megatron/core/` needs only the core reviewers, while a PR that also touches `examples/`, `tools/`, and `megatron/training/` pulls in many additional groups. ## Workflow ### 1. Analyze the PR 1. Fetch the PR details: `gh pr view <number> --repo NVIDIA/Megatron-LM --json title,body,headRefName,author` and `gh pr diff <number> --repo NVIDIA/Megatron-LM --stat`. Also determine the current GitHub user with `gh api user --jq .login`. 2. Parse `.github/CODEOWNERS` to build a mapping from file path patterns to owner groups. 3. For each changed file in the PR, determine which CODEOWNERS groups would be required to review it. 4. Build a summary table grouped by CODEOWNERS group, showing which files pull in which groups. 5. Count the total number of distinct reviewer groups the PR currently requires. ### 2. Propose a split that minimizes reviewer groups per PR The primary optimization goal: **minimize the number of CODEOWNERS reviewer groups required for each resulting PR**. Strategy: 1. Cluster files by their CODEOWNERS groups. Files owned by the same set of groups naturally belong together. 2. Identify the largest cluster — this becomes the first (and usually largest) PR. 3. Remaining files form one or more additional PRs, each ideally requiring only one or two reviewer groups. 4. If a split creates a dependency (e.g., PR B uses symbols renamed in PR A), the dependent PR must be merged after the first. Note this explicitly. 5. Each PR must be independently mergeable to main — no broken imports, no missing symbols. Backward-compatible aliases and re-export stubs in the first PR can make this possible. Present the proposed split as a table: - PR name/description - Files included - CODEOWNERS groups required - Dependencies on other PRs (if any) Wait for user approval before proceeding. ### 3. Execute the split (after user approval) For each new PR: 1. Create a new branch from the appropriate base (`main`, or a dependency PR's branch). 2. Extract the relevant changes: `git diff upstream/main..<source-branch> -- <file paths> | git apply`. 3. Stage, commit with a clear message, and push to the user's fork. 4. Create the PR as a **draft** (per repo contributing guidelines). 5. If the original PR needs to be narrowed in scope, confirm with the user before force-pushing. 6. Report all PR URLs when done. ## Important guidelines - Always create PRs as **drafts** and push to the user's fork, never directly to upstream. - Backward-compatible changes (aliases, re-exports, deprecation shims) should go in the first PR so subsequent PRs can depend on them. - Test files should go with the production code they test, not in a separate PR. - Prefer a single clean commit per split PR over replaying the original commit history. - If a file is hard to categorize (e.g., it touches two groups), ask the user which PR it should go in. - If the current GitHub user is not the author of the original PR, each new PR's description must explicitly credit the original author (e.g., "Original changes by @<author> in #<number>").