[UTC] Add utility script to deal with conflicting RUN lines by vtjnash · Pull Request #182546 · llvm/llvm-project

vtjnash · 2026-02-20T17:31:16Z

This script is intended to help improve the quality (specifically, the reviewability) of auto-generated test lines after any of the update_any_test_checks scripts.

Context: I was working on updating OpenMP tests to remove a large number of all-zero-GEP, but the tests currently heavily rely on looking for those all-zero GEP to verify the output is correct. But simply running them in update_cc_test_checks.py makes them substantially larger, and also ignores a substantial fraction of key lines due to conflicting prefixes. Hence the motivation to design this script, which helps pre- and post- process the files for UTC. For the files I looked at, savings of over 80% lines changed were not uncommon (due to having over 100 RUN lines in them, and which had been hand-edited to optimize the prefixes list and the set of lines being checked).

The strategy: given a test file with N RUN lines, this script finds common subsequences between those check blocks and merges them into a single block with fewer, semantically-shared prefixes, often dramatically reducing file size. This is intended to enable auto-fixing tests where update_any_test_checks.py currently complains of conflicting RUN lines (since it only looks at the last --check-prefix given), by recomputing a preferred new set.

With the --split argument, this instead undoes the transform, expanding the file checks into a form that update_any_test_checks.py can handle, while leaving behind some metadata so that re-doing the transform generally assigns stable names. Order of output lines is based on the order of RUN lines, so it should be typically a fixed-point, although it may differ from the original input.

This is primarily intended to be used via the new --merge option to the UTC scripts:

# list of files for which there are conflicting prefixes to fix
$ FILES=""

$ llvm/utils/update_test_checks.py --merge $FILES

Just 2 UTC scripts are updated currently, just to demonstrate this. Updating the others likewise is expected to be similar and simple, but I didn't want those to need to get mixed up in this review. Internally, that option is essentially equivalent to separately calling each tool:

# annotate each RUN line with a single separate check
$ llvm/utils/merge_check_prefixes.py --split $FILES

# run a tool to update each check
$ build/bin/llvm-lit --update-tests -sv $FILES
$ llvm/utils/update_any_test_checks.py $FILES

# re-compute compact diff
llvm/utils/merge_check_prefixes.py $FILES

This separation can be useful for cases where the user doesn't want to re-run the UTC script yet (e.g. because it changes a lot of other unrelated things, either due to past human meddling or tool updates) and wants to first do a NFC compact merge step, then do an NFC update step, and then presumably do a functional change afterwards too.

Usage:

merge_check_prefixes.py [--base-prefix BASE] [--dry-run] [--split] file [file ...]

For a substantial example, I made a branch with a sample change that needed this script (#181845) and then commit the results of running the script on a sample of the tests which needed updates in various different ways: main...vtjnash:llvm-project:jn/merge_check_prefixes-example. The last time this test code needed to be updated, 50k check lines changed into 1000k (https://reviews.llvm.org/D101849), but here there is about a 4x reduction in that check line count.

Co-authored-by: Claude Sonnet 4.5 noreply@anthropic.com
because I haven't written python in several years, so I forget the names of functions in the the stdlib, so it helps research those much more quickly than me.

A question (for later): This could set a threshold below which we are willing to repeat identical lines (to avoid making as many prefix check sets). I had initially expected this to be useful, so I designed the merge algorithm to use a cost function with this ability in mind. But I didn't see much use for that thresholding in my initial testing. Because all the outputs are generated from the same input, there's almost always a meaningful fraction of lines that will be shared.

This script was initially produced by Claude with the following prompt, and then a lot of human-assisted-AI post-processing:
Make a script to post-process the CHECK lines in this file looking for common sequences: clang/test/OpenMP/target_map_codegen_21.cpp. Write it in python and put it in llvm/utils near the update_test_check.py files. Have the script create a NxM matrix of the score of common lines between each CHECK line group. Read in each file reusing logic from update_test_checks.py to find the current FileCheck markers being used. Then filter the file for the set of checks corresponding to each RUN FileCheck line, stripping the CHECK prefix, keeping only the suffixes like -DAG and -NEXT. Then compute a difference score based on the line count: wc -l diff -u a b``. Take the lowest scoring pair, and zipper merge them together like so: : common; -NEXT: in1; -NEXT: in2; -NEXT common. Remove those two pairs from the score matrix, then replace it with a new entry that takes the minimum of each old entry. Repeat the process until only one entry remains. Then iterate over the final result and compute the set of each FileCheck RUN line that uses that line, and assign a check prefix name to each unique group of FileCheck line uses. Prepend that computed CHECK prefix name to each line. Update each FileCheck RUN line prefix list with the set of applicable prefixes. Filter out all old CHECK lines in the file, and write the new CHECK lines to the file.

github-actions · 2026-02-20T17:33:11Z

✅ With the latest revision this PR passed the Python code formatter.

nikic · 2026-02-23T15:44:15Z

I think there's two different features here. One is ability to generate UTC checks that are not "repeat the whole function for each differing prefix" and instead have one check block where only the differing parts use different prefixes. I think this is a really great feature, and something I've wanted a few times in the past. It is not uncommon for the difference between two prefixes to be very small, and this gets lost when all check lines are repeated per prefix.

However, I think that this functionality is really something that should be part of UTC itself. We should have a flag that switches between "check blocks per prefix" and "one check block for all prefixes".

The other feature here seems to be automatic picking of check prefixes? This are generally picked manually to be meaningful, and I can't say I've encountered the need for automation in this space yet. Looking at your sample, it looks like the generated prefixes end up being pretty meaningless. (It looks like existing prefixes on OpenMP tests are already often meaningless, but this seems to be a problem specific to these tests, and not really something we should perpetuate.)

vtjnash · 2026-02-23T16:15:32Z

It is not uncommon for the difference between two prefixes to be very small, and this gets lost when all check lines are repeated per prefix.

Yep, that was my thinking here too. The additional insight for this proposed PR is that a unified diff is pretty good at interpreting what "small difference" means to a computer.

However, I think that this functionality is really something that should be part of UTC itself. We should have a flag that switches between "check blocks per prefix" and "one check block for all prefixes".

I was going to suggest that also, as a followup. This is currently written as an external tool, because it makes demonstrating it and testing separately a lot easier. But once merged, I would suggest that all the UTC tools be updated to have a --merge flag, which just automatically runs this tool to pre/post process the output.

The other feature here seems to be automatic picking of check prefixes? This are generally picked manually to be meaningful, and I can't say I've encountered the need for automation in this space yet. Looking at your sample, it looks like the generated prefixes end up being pretty meaningless. (It looks like existing prefixes on OpenMP tests are already often meaningless, but this seems to be a problem specific to these tests, and not really something we should perpetuate.)

I don't think that is a separate feature (I agree that is hardly even a feature). It is largely being forced upon it, because it has to give some label to the parts of the check block that differ. If the user has initially given each run line a separately descriptive check prefix, then the tool will generate unified names. And if the check lines are already split in the same way as the tool merges them, the tool will always reuse names already given (preserving them when doing the split, and re-applying them when doing the merge).

One improvement I was thinking might help here was to emit a list of the machine-generated prefixes at the top of file. That way the user then can go in and do a quick find-replace to give them a better semantic name. Theoretically we could try to have the machine AI suggest a better name, but that seems like it would be a separate feature that we don't really need.

This script is intended to help improve the quality (specifically, the reviewability) of auto-generated test lines after any of the update_any_test_checks scripts. Context: I was working on updating OpenMP tests to remove a large number of all-zero-GEP, but the tests currently heavily rely on looking for those all-zero GEP to verify the output is correct. But simply running them in update_cc_test_checks.py makes them substantially larger, and also ignores a substantial fraction of key lines due to conflicting prefixes. Hence the motivation to design this script, which helps pre- and post- process the files for UTC. For the files I looked at, savings of over 80% lines changed were not uncommon (due to having over 100 RUN lines in them, and which had been hand-edited to optimize the prefixes list and the set of lines being checked). The strategy: given a test file with N RUN lines, this script finds common subsequences between those check blocks and merges them into a single block with fewer, semantically-shared prefixes, often dramatically reducing file size. This is intended to enable auto-fixing tests where update_any_test_checks.py currently complains of conflicting RUN lines (since it only looks at the last --check-prefix given), by recomputing a preferred new set. With the --split argument, this instead undoes the transform, expanding the file checks into a form that update_any_test_checks.py can handle, while leaving behind some metadata so that re-doing the transform generally assigns stable names. Order of output lines is based on the order of RUN lines, so it should be typically a fixed-point, although it may differ from the original input. Example intended workflow: # list of files for which there are conflicting prefixes to fix $ FILES="" # annotate each RUN line with a single separate check $ llvm/utils/merge_check_prefixes.py --split $FILES # run a tool to update each check $ build/bin/llvm-lit --update-tests -sv $FILES $ llvm/utils/update_any_test_checks.py $FILES # re-compute compact diff llvm/utils/merge_check_prefixes.py $FILES Usage: merge_check_prefixes.py [--base-prefix BASE] [--dry-run] [--split] file [file ...] Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Some questions to check: currently this recomputes the merge score using len(unified_diff) and repeats until all sequences are merged. This often results in a binary-tree merge, and can be a bit slow initially. We could perhaps instead define that the pair-wise cost of the merged sequence equals the smaller of the costs of the original elements of the combined pair. We could also set a threshold below which we are willing to repeat identical lines (to avoid making as many prefix check sets). But currently neither seemed necessary in my experiments. This script was produced by claude with the following prompt, and a lot of post-processing: Make a script to post-process the CHECK lines in this file looking for common sequences: clang/test/OpenMP/target_map_codegen_21.cpp. Write it in python and put it in llvm/utils near the update_test_check.py files. Have the script create a NxM matrix of the score of common lines between each CHECK line group. Read in each file reusing logic from update_test_checks.py to find the current FileCheck markers being used. Then filter the file for the set of checks corresponding to each RUN FileCheck line, stripping the CHECK prefix, keeping only the suffixes like -DAG and -NEXT. Then compute a difference score based on the line count: `wc -l `diff -u a b``. Take the lowest scoring pair, and zipper merge them together like so: `: common; -NEXT: in1; -NEXT: in2; -NEXT common`. Remove those two pairs from the score matrix, then replace it with a new entry that takes the minimum of each old entry. Repeat the process until only one entry remains. Then iterate over the final result and compute the set of each FileCheck RUN line that uses that line, and assign a check prefix name to each unique group of FileCheck line uses. Prepend that computed CHECK prefix name to each line. Update each FileCheck RUN line prefix list with the set of applicable prefixes. Filter out all old CHECK lines in the file, and write the new CHECK lines to the file.

llvmbot · 2026-04-01T21:21:48Z

@llvm/pr-subscribers-testing-tools

Author: Jameson Nash (vtjnash)

Changes

This script is intended to help improve the quality (specifically, the reviewability) of auto-generated test lines after any of the update_any_test_checks scripts.

Context: I was working on updating OpenMP tests to remove a large number of all-zero-GEP, but the tests currently heavily rely on looking for those all-zero GEP to verify the output is correct. But simply running them in update_cc_test_checks.py makes them substantially larger, and also ignores a substantial fraction of key lines due to conflicting prefixes. Hence the motivation to design this script, which helps pre- and post- process the files for UTC. For the files I looked at, savings of over 80% lines changed were not uncommon (due to having over 100 RUN lines in them, and which had been hand-edited to optimize the prefixes list and the set of lines being checked).

The strategy: given a test file with N RUN lines, this script finds common subsequences between those check blocks and merges them into a single block with fewer, semantically-shared prefixes, often dramatically reducing file size. This is intended to enable auto-fixing tests where update_any_test_checks.py currently complains of conflicting RUN lines (since it only looks at the last --check-prefix given), by recomputing a preferred new set.

With the --split argument, this instead undoes the transform, expanding the file checks into a form that update_any_test_checks.py can handle, while leaving behind some metadata so that re-doing the transform generally assigns stable names. Order of output lines is based on the order of RUN lines, so it should be typically a fixed-point, although it may differ from the original input.

Example intended workflow:

# list of files for which there are conflicting prefixes to fix
$ FILES=""

# annotate each RUN line with a single separate check
$ llvm/utils/merge_check_prefixes.py --split $FILES

# run a tool to update each check
$ build/bin/llvm-lit --update-tests -sv $FILES
$ llvm/utils/update_any_test_checks.py $FILES

# re-compute compact diff
llvm/utils/merge_check_prefixes.py $FILES

Usage:
merge_check_prefixes.py [--base-prefix BASE] [--dry-run] [--split] file [file ...]

For a substantial example, I made a branch with a sample change that needed this script (#181845) and then commit the results of running the script on a sample of the tests which needed updates in various different ways: main...vtjnash:llvm-project:jn/merge_check_prefixes-example. The last time this test code needed to be updated, 50k check lines changed into 1000k (https://reviews.llvm.org/D101849), but here there is about a 4x reduction in that check line count.

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
because I haven't written python in several years, so I forget the names of functions in the the stdlib, so it helps research those much more quickly than me.

Some questions (for later): currently this recomputes the merge score using len(unified_diff) and repeats until all sequences are merged. This often results in a binary-tree merge, and can be a bit slow initially. We could probably instead define that the pair-wise cost of the merged sequence equals the smaller of the costs of the original elements of the combined pair. This could set a threshold below which we are willing to repeat identical lines (to avoid making as many prefix check sets).

This script was initially produced by Claude with the following prompt, and then a lot of human-assisted-AI post-processing:
Make a script to post-process the CHECK lines in this file looking for common sequences: clang/test/OpenMP/target_map_codegen_21.cpp. Write it in python and put it in llvm/utils near the update_test_check.py files. Have the script create a NxM matrix of the score of common lines between each CHECK line group. Read in each file reusing logic from update_test_checks.py to find the current FileCheck markers being used. Then filter the file for the set of checks corresponding to each RUN FileCheck line, stripping the CHECK prefix, keeping only the suffixes like -DAG and -NEXT. Then compute a difference score based on the line count: wc -l diff -u a b``. Take the lowest scoring pair, and zipper merge them together like so: : common; -NEXT: in1; -NEXT: in2; -NEXT common. Remove those two pairs from the score matrix, then replace it with a new entry that takes the minimum of each old entry. Repeat the process until only one entry remains. Then iterate over the final result and compute the set of each FileCheck RUN line that uses that line, and assign a check prefix name to each unique group of FileCheck line uses. Prepend that computed CHECK prefix name to each line. Update each FileCheck RUN line prefix list with the set of applicable prefixes. Filter out all old CHECK lines in the file, and write the new CHECK lines to the file.

Patch is 33.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/182546.diff

3 Files Affected:

(added) llvm/utils/merge_check_prefixes.py (+780)
(modified) llvm/utils/update_cc_test_checks.py (+22-1)
(modified) llvm/utils/update_test_checks.py (+22-1)

diff --git a/llvm/utils/merge_check_prefixes.py b/llvm/utils/merge_check_prefixes.py
new file mode 100755
index 0000000000000..664e857cfb2ab
--- /dev/null
+++ b/llvm/utils/merge_check_prefixes.py
@@ -0,0 +1,780 @@
+#!/usr/bin/env python3
+"""merge_check_prefixes.py – Merge redundant FileCheck prefixes.
+
+Given a test file with N RUN lines, this script finds common subsequences between
+those check blocks and merges them into a single block with fewer, semantically-
+shared prefixes, often dramatically reducing file size. This is intended to
+enable auto-fixing code where update_any_test_checks.py complains of conflicting
+RUN lines, by recomputing a preferred new set.
+
+With --split, this undoes the transform, expanding the file into a form that
+update_any_test_checks.py can handle, while leaving behind some metadata so that
+re-doing the transform generally assigns stable names.
+
+Example intended workflow:
+
+    # list of files for which there are conflicting prefixes to fix
+    $ FILES=""
+
+    # annotate each RUN line with a single separate check
+    $ llvm/utils/merge_check_prefixes.py --split $FILES
+
+    # run a tool to update each check
+    $ build/bin/llvm-lit --update-tests -sv $FILES
+    $ llvm/utils/update_any_test_checks.py $FILES
+
+    # re-compute compact diff
+    llvm/utils/merge_check_prefixes.py $FILES
+
+Usage:
+    merge_check_prefixes.py [--base-prefix BASE] [--dry-run] [--split] file [file ...]
+"""
+
+import argparse
+import difflib
+import os
+import re
+import sys
+
+# ---------------------------------------------------------------------------
+# Reuse helpers from UpdateTestChecks/common.py
+# ---------------------------------------------------------------------------
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "UpdateTestChecks"))
+import common  # noqa: E402  (after sys.path manipulation)
+
+# ---------------------------------------------------------------------------
+# Regex helpers
+# ---------------------------------------------------------------------------
+
+# Match check-prefix / check-prefixes flags in RUN lines.
+_CHECK_PREFIX_FLAG_RE = re.compile(r"\s+(--?check-prefix(?:es)?[= ])(\S+)")
+
+# Matches plain comment lines that act as separators (e.g. "//" or "//.").
+# suffix=None is used as a sentinel for these in check-line tuples.
+_PLAIN_COMMENT_RE = re.compile(r"^\s*(?://|;|#)\.?\s*$")
+
+# Matches coverage comment lines written by --split mode, e.g.:
+#   ; --check-prefix=CHECK-A=1, 2, 5
+_COVERAGE_COMMENT_RE = re.compile(
+    r"^\s*(?://|[;#])\s*--check-prefix=(\S+)=\s*([\d,\s]+)$"
+)
+
+# ---------------------------------------------------------------------------
+# Parsing
+# ---------------------------------------------------------------------------
+
+
+def parse_coverage_comments(lines):
+    """Re-compute prefix_name_map from coverage comment lines embedded in a file.
+
+    Scans *lines* for lines matching the pattern written by --split mode:
+        <comment> --check-prefix=NAME=idx, idx, ...
+    Returns a dict mapping frozenset(rl_indices) -> prefix_name, or None if no
+    such lines are found.
+    """
+    prefix_name_map = {}
+    for line in lines:
+        m = _COVERAGE_COMMENT_RE.match(line)
+        if m:
+            name = m.group(1)
+            indices = frozenset(int(s) for s in m.group(2).split(",") if s.strip())
+            prefix_name_map[indices] = name
+    return prefix_name_map or None
+
+
+def collect_check_lines_for_run(lines, prefixes):
+    """Return (check_lines, line_numbers) for all *prefixes* of one RUN line.
+
+    Iterates the file once in order. For each line, checks it against every
+    prefix regex; if it matches, records it as a check line. Plain comment
+    lines (e.g. "//" or "//.") that immediately follow a check line are also
+    included.
+
+    check_lines : list of (content, is_plain)
+                  content is "suffix: body" for check lines, raw text for plain
+                  is_plain is False for check lines, True for plain comments
+    line_numbers: set of 0-based indices that belong to this run's check block
+    """
+    prefix_set = frozenset(prefixes)
+
+    check_lines = []
+    line_numbers = set()
+    after_check = False  # True while we may still consume trailing plain comments
+
+    for i, line in enumerate(lines):
+        m = common.CHECK_RE.match(line)
+        matched = m is not None and m.group(1) in prefix_set
+        if matched:
+            content = line[m.end(1) :]
+            check_lines.append((content, False, i))
+            line_numbers.add(i)
+            after_check = True
+        if not matched:
+            if after_check and _PLAIN_COMMENT_RE.match(line):
+                check_lines.append((line, True, i))
+                line_numbers.add(i)
+            else:
+                after_check = False
+
+    return check_lines, line_numbers
+
+
+def find_run_lines(test, lines):
+    """Local copy of common.find_run_lines that also returns per-RUN line ranges.
+
+    Returns (run_lines, run_line_ranges) where run_line_ranges is a list of
+    range objects giving the 0-based line indices spanned by each RUN statement
+    (accounting for backslash continuations).
+    """
+    common.debug("Scanning for RUN lines in test file:", test)
+    raw = [
+        (i, m.group(1))
+        for i, l in enumerate(lines)
+        for m in (common.RUN_LINE_RE.match(l),)
+        if m
+    ]
+    if not raw:
+        return [], []
+
+    run_lines = []
+    run_line_ranges = []
+    start_idx, content = raw[0]
+    end_idx = start_idx
+    for i, line_content in raw[1:]:
+        if content.endswith("\\"):
+            content = content.rstrip("\\") + " " + line_content
+            end_idx = i
+        else:
+            run_lines.append(content)
+            run_line_ranges.append(range(start_idx, end_idx + 1))
+            start_idx = end_idx = i
+            content = line_content
+    run_lines.append(content)
+    run_line_ranges.append(range(start_idx, end_idx + 1))
+
+    common.debug("Found {} RUN lines in {}:".format(len(run_lines), test))
+    for l in run_lines:
+        common.debug("  RUN: {}".format(l))
+    return run_lines, run_line_ranges
+
+
+def longest_common_prefix(strings):
+    if not strings:
+        return ""
+    s = strings[0]
+    for t in strings[1:]:
+        while not t.startswith(s):
+            s = s[:-1]
+            if not s:
+                return ""
+    return s
+
+
+def auto_base_prefix(prefixes):
+    """Detect a common leading string from all prefix names, e.g. 'CHECK'."""
+    base = longest_common_prefix(list(prefixes))
+    # Strip any trailing suffixes
+    base = base.rstrip("0123456789-")
+    return base or "CHECK"
+
+
+# ---------------------------------------------------------------------------
+# Score / distance
+# ---------------------------------------------------------------------------
+
+
+def _seq_to_strs(seq):
+    return [(";" if c else "*") + content for content, c, _ in seq]
+
+
+def score(seq_a, seq_b):
+    """Number of diff lines (unified diff hunk lines) between two sequences."""
+    a = _seq_to_strs(seq_a)
+    b = _seq_to_strs(seq_b)
+    return sum(1 for _ in difflib.unified_diff(a, b))
+
+
+# ---------------------------------------------------------------------------
+# Zipper merge
+# ---------------------------------------------------------------------------
+
+
+def zipper_merge(seq_a, seq_b):
+    """Merge two check-line sequences using SequenceMatcher.
+
+    Returns a new sequence of (suffix, content, present_in) triples where
+    *present_in* is the union of the contributing sequences' present_in sets.
+    Equal blocks are emitted once with a unioned present_in; A-only lines
+    keep A's present_in; B-only lines keep B's present_in.  Within each
+    "gap" between equal blocks, A-only lines come before B-only lines so
+    that the logical order is: common, then A variant, then B variant.
+    """
+    a_strs = _seq_to_strs(seq_a)
+    b_strs = _seq_to_strs(seq_b)
+    sm = difflib.SequenceMatcher(None, a_strs, b_strs, autojunk=False)
+    result = []
+    for tag, i1, i2, j1, j2 in sm.get_opcodes():
+        if tag == "equal":
+            for ai, bi in zip(range(i1, i2), range(j1, j2)):
+                ca, ia, pa = seq_a[ai]
+                cb, ib, pb = seq_b[bi]
+                result.append((ca, ia, pa | pb))
+                assert ca == cb
+                assert ia == ib
+        elif tag == "replace":
+            for k in range(i1, i2):
+                result.append(seq_a[k])
+            for k in range(j1, j2):
+                result.append(seq_b[k])
+        elif tag == "delete":
+            for k in range(i1, i2):
+                result.append(seq_a[k])
+        elif tag == "insert":
+            for k in range(j1, j2):
+                result.append(seq_b[k])
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Merge strategies
+# ---------------------------------------------------------------------------
+
+
+def concat(items):
+    """Concatenate all sequences in order without merging."""
+    result = []
+    for seq in items:
+        result.extend(seq)
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Hierarchical (single-linkage) merge
+# ---------------------------------------------------------------------------
+
+
+def hierarchical_merge(items, label):
+    """Single-linkage agglomerative merge of all sequences.
+
+    *seqs* is a list of sequences; each line already carries its own pset.
+    Returns a single merged sequence.
+    """
+    total = len(items) - 1
+    step = 0
+    prefix = f"{label} " if label else ""
+
+    # Pre-compute all pairwise scores before any merging.
+    # sc[(i, j)] with i < j holds the score between items[i] and items[j].
+    # Indices are stable: merged results are appended to items with fresh indices.
+    sc = {}
+    n = len(items)
+    active = list(range(n))
+    for i in range(n):
+        for j in range(i + 1, n):
+            sc[i, j] = score(items[i], items[j])
+
+    while len(active) > 1:
+        # Find the pair with the lowest score; break ties by (i, j) order.
+        best_score = None
+        best_i = best_j = -1
+        for (i, j), s in sc.items():
+            if (
+                best_score is None
+                or s < best_score
+                or (s == best_score and (i, j) < (best_i, best_j))
+            ):
+                best_score = s
+                best_i, best_j = i, j
+
+        merged_seq = zipper_merge(items[best_i], items[best_j])
+        new_idx = len(items)
+        items.append(merged_seq)
+
+        # Remove consumed indices so the loops below visit only survivors.
+        active.remove(best_i)
+        active.remove(best_j)
+
+        # Score the merged item against each remaining active item: take the min
+        # of the two inherited scores and the actual diff of the new merged text.
+        for k in active:
+            si = sc[min(k, best_i), max(k, best_i)]
+            sj = sc[min(k, best_j), max(k, best_j)]
+            sc[k, new_idx] = min(si, sj, score(merged_seq, items[k]))
+
+        # Drop all entries that reference the now-consumed indices.
+        for k in active:
+            sc.pop((min(k, best_i), max(k, best_i)))
+            sc.pop((min(k, best_j), max(k, best_j)))
+        sc.pop((best_i, best_j))
+
+        active.append(new_idx)
+
+        step += 1
+        common.debug(
+            f"  {prefix}merge {step}/{total} (score {best_score})", end="\r", flush=True
+        )
+    common.debug()
+
+    return items[-1]
+
+
+# ---------------------------------------------------------------------------
+# Prefix name assignment
+# ---------------------------------------------------------------------------
+
+
+def human_sort_key(s):
+    """Sort key for alphanumeric strings, sorting digit runs by numeric value.
+
+    E.g. "foo10" sorts after "foo9" rather than before it.
+    """
+    return [(0, int(p)) if p.isdigit() else (1, p) for p in re.split(r"(\d+)", s) if p]
+
+
+def assign_prefix_names(
+    block_merges,
+    all_rls_frozenset,
+    base,
+    rl_coverage,
+    run_prefix_map,
+    all_prefixes,
+    split,
+):
+    """Map each unique present_in set found in *block_merges* to a new prefix name.
+
+    Each presence set is a frozenset of run-line strings.  *rl_coverage* maps
+    frozenset(run_lines) → original_prefix_name for reuse when possible.
+    *run_prefix_map* supplies the original prefix names used by each run line,
+    for building generated names.
+    """
+    presence_sets = set()
+    for seq in block_merges:
+        for _content, _is_plain, pset in seq:
+            presence_sets.add(pset)
+
+    # Generate or reuse a new name from the distinguishing suffixes of the prefix
+    # names used by the run lines in this set.
+    def _pset_name(pset):
+        if pset == all_rls_frozenset:
+            return base
+        existing = rl_coverage.get(pset)
+        if existing is not None:
+            return existing
+        if split:
+            # Create the name by combining all prefixes
+            def _rl_suffixes_split(rl):
+                """Extract ordered union of all parts from all prefixes for this run line."""
+                result = {}
+                for p in run_prefix_map[rl]:
+                    result.update(dict.fromkeys(p.split("-")))
+                result.pop("", None)
+                if not result or (len(result) == 1 and base in result):
+                    # if the only key is CHECK, convert to CHECK-1
+                    result = dict.fromkeys([base, str(rl + 1)])
+                return result
+
+            suffix_sets = [_rl_suffixes_split(rl) for rl in sorted(pset)]
+            # collect all unique suffixes in an ordered set
+            all_unique = {}
+            for s in suffix_sets:
+                all_unique.update(s)
+            keys = []
+            # move base to the head, if present
+            if base in all_unique:
+                all_unique.pop(base)
+                keys.append(base)
+            keys.extend(all_unique.keys())
+            candidate = "-".join(keys)
+        else:
+            # Create the name by finding any common part in each run line
+            # and appending them together with other lines
+            def _rl_suffixes_merge(rl):
+                """Compute intersection of all parts across all prefixes for this run line."""
+                suffix_sets = []
+                for p in run_prefix_map[rl]:
+                    suffix_sets.append(dict.fromkeys(p.split("-")))
+                result = {}
+                if suffix_sets:
+                    result = suffix_sets[0]
+                    for s in suffix_sets[1:]:
+                        result = dict.fromkeys(k for k in result if k in s)
+                    result.pop("", None)
+                if not result or (len(result) == 1 and base in result):
+                    # if the only key is CHECK, convert to CHECK-1
+                    result = dict.fromkeys([base, str(rl + 1)])
+                return result
+
+            keys = {}
+            for s in (_rl_suffixes_merge(rl) for rl in sorted(pset)):
+                keys.update(s)
+            candidate = []
+            # move base to the head, if present
+            if base in keys:
+                keys.pop(base)
+                candidate.append(base)
+            candidate.extend(keys.keys())
+            candidate = "-".join(candidate)
+        name, n = candidate, 2
+        while name in all_prefixes:
+            name = candidate + "-_" + str(n)
+            n += 1
+        all_prefixes.add(name)
+        return name
+
+    return {pset: _pset_name(pset) for pset in presence_sets}
+
+
+# ---------------------------------------------------------------------------
+# RUN line rewriting
+# ---------------------------------------------------------------------------
+
+
+def new_prefixes_for_run(rl, prefix_name_map):
+    """Return the sorted list of new prefix names that run line *rl* needs.
+
+    A run line needs every new prefix whose presence set contains *rl*.
+    """
+    return sorted(name for pset, name in prefix_name_map.items() if rl in pset)
+
+
+def rewrite_run_line(run_line, new_names, any_replaced=False):
+    """Replace all --check-prefix(es) flags in *run_line* with the new list."""
+    if len(new_names) == 1:
+        replacement_flag = " --check-prefix=" + new_names[0]
+    else:
+        replacement_flag = " --check-prefixes=" + ",".join(new_names)
+
+    # Replace the first occurrence; drop any subsequent check-prefix flags
+    replaced = any_replaced
+
+    is_default = replacement_flag == " --check-prefix=CHECK"
+
+    def _replace(m):
+        nonlocal replaced
+        if not replaced:
+            replaced = True
+            return "" if is_default else replacement_flag
+        return ""
+
+    new_line = _CHECK_PREFIX_FLAG_RE.sub(_replace, run_line)
+    return new_line, replaced
+
+
+# ---------------------------------------------------------------------------
+# File rewrite
+# ---------------------------------------------------------------------------
+
+
+def comment_char(lines):
+    """Detect the comment character used in check lines."""
+    for line in lines:
+        stripped = line.lstrip()
+        if stripped.startswith("//"):
+            return "//"
+        if stripped.startswith(";"):
+            return ";"
+        if stripped.startswith("#"):
+            return "#"
+    return "//"
+
+
+def emit_check_block(merged_seq, prefix_name_map, comment):
+    """Render the merged check block as a list of text lines (with newlines)."""
+    out = []
+    for content, is_plain, pset in merged_seq:
+        if is_plain:
+            out.append(content)
+        else:
+            name = prefix_name_map[pset]
+            out.append(
+                "{comment} {name}{content}".format(
+                    comment=comment, name=name, content=content
+                )
+            )
+    return out
+
+
+# ---------------------------------------------------------------------------
+# Main processing of a single file
+# ---------------------------------------------------------------------------
+
+
+def process_lines(lines, path, base_prefix_override=None, split=False):
+    """Process *lines* (as returned by readlines()) for *path* and return the
+    rewritten line list, or None if the file should be skipped."""
+
+    # 1. Find RUN lines and extract FileCheck prefixes
+    run_lines_raw, run_line_ranges = find_run_lines(path, lines)
+    _filtered = []
+    _filtered_ranges = []
+    for rl, rl_range in zip(run_lines_raw, run_line_ranges):
+        if "|" not in rl:
+            common.warn("Skipping unparsable RUN line: " + rl)
+        else:
+            _filtered.append(rl)
+            _filtered_ranges.append(rl_range)
+    run_lines_raw = _filtered
+    run_line_ranges = _filtered_ranges
+    if not run_lines_raw:
+        common.warn(f"{path}: no RUN lines found, skipping.")
+        return None
+
+    # Map each run line to its FileCheck prefix(es)
+    run_prefix_list = []  # ordered unique run-line strings
+    _rl_to_idx = {}  # run_line_str -> int index into run_prefix_list
+    run_prefix_map = []  # list of prefix lists, indexed by rl_idx
+    all_prefixes = set()
+    for rl in run_lines_raw:
+        _rl_to_idx[rl] = len(run_prefix_list)
+        run_prefix_list.append(rl)
+        prefixes = common.get_check_prefixes(rl)
+        run_prefix_map.append(prefixes)
+        all_prefixes.update(prefixes)
+
+    if not all_prefixes:
+        common.warn(f"{path}: no FileCheck prefixes found, skipping.")
+        return None
+
+    # 2. Collect check lines for each RUN line
+    _seq_lnums = [collect_check_lines_for_run(lines, pfxs) for pfxs in run_prefix_map]
+    run_seqs = [seq for seq, _ in _seq_lnums]
+    check_line_indices = set().union(*(lnums for _, lnums in _seq_lnums))
+    check_line_blocks = []
+    for idx in range(len(lines)):
+        if idx in check_line_indices:
+            if check_line_blocks and check_line_blocks[-1].stop == idx:
+                check_line_blocks[-1] = range(check_line_blocks[-1].start, idx + 1)
+            else:
+                check_line_blocks.append(range(idx, idx + 1))
+    erase_line_indices =...
[truncated]

RKSimon · 2026-04-02T01:29:59Z

A simpler feature we could do with is a warning when 2 RUN lines emit the same codegen for a test but don't share a common check prefix to allow them to be merged.

vtjnash · 2026-04-02T01:37:56Z

That seems reasonable, but entirely different from the purpose of this PR, which is to just fix those things that cause warnings and incorrect UTC output. But that is intended to be easily possible to implement as an option in this PR. The hierarchical_merge is set up to support having a merge cost threshold for whether to precisely merge or simply concat. Setting it to zero would then only combine exactly identical check sequences.

hnrklssn · 2026-04-10T06:13:57Z

Since there aren't any test cases (should there be?), I'll ask to be sure: does this do structural diffing for captured variables, so that they can be merged even if the names differ?
Sometimes when generating clang codegen tests with RUN lines they generate slightly different metadata, and thus the matchers end up naming the captured variables for metadata differently, so UTC thinks the checks clash if they share a prefix. But the generated checks actually match both outputs. My workflow in these cases is normally to temporarily disable one RUN line, regenerate, and then reenable the RUN line.

To make what I'm asking more concrete, here's an example:
Imagine the first RUN line emits these checks:

; CHECK_A: [[TMP2:%.*]] = add i64 [[TMP0]] [[TMP1]], some_metadata [[META5:!.*]]
; CHECK_A: [[META5]] = %{}

and the second RUN line emits these checks:

; CHECK_B: [[TMP3:%.*]] = add i64 [[TMP0]] [[TMP1]], some_metadata [[META8:!.*]]
; CHECK_B: [[META8]] = %{}

The actual emitted IR from both RUN lines, even if slightly different, would still match either set of checks. Would this be able to merge them?

Separately: should we update the auto-updater in lit so that it automatically invokes the merge script when updating UTC tests?

vtjnash · 2026-04-10T14:38:29Z

there aren't any test cases (should there be?)

Happy to add them. The other scripts didn't have any, so I didn't add them initially.

I'll ask to be sure: does this do structural diffing for captured variables, so that they can be merged even if the names differ?

It doesn't, since I hadn't noticed that case myself. The expectation is that UTC is supposed to already handle that already. This presently relies on the UTC script making stable choices, to reduce–though not necessarily eliminate–the need for human cleanup afterwards. My initial reaction is that we could fix that example with help from UTC–in conjunction with this current script–by having UTC look at the other nearby CHECK prefixes when picking names for CHECK_B. I believe it already looks for names to re-use, so that would be an expansion of that particular logic. Notably too, this merge script is already computing what "nearby" means (defining it as a unified diff), so possibly UTC could use this script iteratively: generate a reference tentative merge before each RUN line is processed to use that as a source of names for the next RUN line. At the end, the names will have been picked based on the other nearby options, so the final merge should be more likely to fold entirely.

should we update the auto-updater in lit so that it automatically invokes the merge script when updating UTC tests?

That's already done here (the --merge flag is recorded in the autogen header).

vtjnash · 2026-04-10T14:40:30Z

I also want to cross-link that this PR has an RFC comment thread now too: https://discourse.llvm.org/t/rfc-use-a-unified-diff-strategy-to-solve-the-conflicting-filecheck-prefix-issue-in-update-test-checks/90434

vtjnash added 5 commits April 1, 2026 20:15

darker

23f8306

add --merge flag to UTC

6460817

instruct user to rename the generated check prefixes

c6b6b98

score entries better and faster

005df94

vtjnash force-pushed the jn/merge_check_prefixes branch from 562c5c2 to 005df94 Compare April 1, 2026 21:21

llvmbot added the testing-tools label Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UTC] Add utility script to deal with conflicting RUN lines#182546

[UTC] Add utility script to deal with conflicting RUN lines#182546
vtjnash wants to merge 5 commits intollvm:mainfrom
vtjnash:jn/merge_check_prefixes

vtjnash commented Feb 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

nikic commented Feb 23, 2026

Uh oh!

vtjnash commented Feb 23, 2026 •

edited

Loading

Uh oh!

llvmbot commented Apr 1, 2026

Uh oh!

RKSimon commented Apr 2, 2026

Uh oh!

vtjnash commented Apr 2, 2026

Uh oh!

hnrklssn commented Apr 10, 2026

Uh oh!

vtjnash commented Apr 10, 2026

Uh oh!

vtjnash commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vtjnash commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Feb 23, 2026

Uh oh!

vtjnash commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 1, 2026

Uh oh!

RKSimon commented Apr 2, 2026

Uh oh!

vtjnash commented Apr 2, 2026

Uh oh!

hnrklssn commented Apr 10, 2026

Uh oh!

vtjnash commented Apr 10, 2026

Uh oh!

vtjnash commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vtjnash commented Feb 20, 2026 •

edited

Loading

github-actions bot commented Feb 20, 2026 •

edited

Loading

vtjnash commented Feb 23, 2026 •

edited

Loading