[UTC] Add utility script to deal with conflicting RUN lines#182546
[UTC] Add utility script to deal with conflicting RUN lines#182546
Conversation
|
✅ With the latest revision this PR passed the Python code formatter. |
|
I think there's two different features here. One is ability to generate UTC checks that are not "repeat the whole function for each differing prefix" and instead have one check block where only the differing parts use different prefixes. I think this is a really great feature, and something I've wanted a few times in the past. It is not uncommon for the difference between two prefixes to be very small, and this gets lost when all check lines are repeated per prefix. However, I think that this functionality is really something that should be part of UTC itself. We should have a flag that switches between "check blocks per prefix" and "one check block for all prefixes". The other feature here seems to be automatic picking of check prefixes? This are generally picked manually to be meaningful, and I can't say I've encountered the need for automation in this space yet. Looking at your sample, it looks like the generated prefixes end up being pretty meaningless. (It looks like existing prefixes on OpenMP tests are already often meaningless, but this seems to be a problem specific to these tests, and not really something we should perpetuate.) |
Yep, that was my thinking here too. The additional insight for this proposed PR is that a unified diff is pretty good at interpreting what "small difference" means to a computer.
I was going to suggest that also, as a followup. This is currently written as an external tool, because it makes demonstrating it and testing separately a lot easier. But once merged, I would suggest that all the UTC tools be updated to have a
I don't think that is a separate feature (I agree that is hardly even a feature). It is largely being forced upon it, because it has to give some label to the parts of the check block that differ. If the user has initially given each run line a separately descriptive check prefix, then the tool will generate unified names. And if the check lines are already split in the same way as the tool merges them, the tool will always reuse names already given (preserving them when doing the split, and re-applying them when doing the merge). One improvement I was thinking might help here was to emit a list of the machine-generated prefixes at the top of file. That way the user then can go in and do a quick find-replace to give them a better semantic name. Theoretically we could try to have the machine AI suggest a better name, but that seems like it would be a separate feature that we don't really need. |
This script is intended to help improve the quality (specifically, the
reviewability) of auto-generated test lines after any of the
update_any_test_checks scripts.
Context: I was working on updating OpenMP tests to remove a large number
of all-zero-GEP, but the tests currently heavily rely on looking for
those all-zero GEP to verify the output is correct. But simply running
them in update_cc_test_checks.py makes them substantially larger, and
also ignores a substantial fraction of key lines due to conflicting
prefixes. Hence the motivation to design this script, which helps pre-
and post- process the files for UTC. For the files I looked at, savings
of over 80% lines changed were not uncommon (due to having over 100 RUN
lines in them, and which had been hand-edited to optimize the prefixes
list and the set of lines being checked).
The strategy: given a test file with N RUN lines, this script finds
common subsequences between those check blocks and merges them into a
single block with fewer, semantically-shared prefixes, often
dramatically reducing file size. This is intended to enable auto-fixing
tests where update_any_test_checks.py currently complains of conflicting
RUN lines (since it only looks at the last --check-prefix given), by
recomputing a preferred new set.
With the --split argument, this instead undoes the transform, expanding
the file checks into a form that update_any_test_checks.py can handle,
while leaving behind some metadata so that re-doing the transform
generally assigns stable names. Order of output lines is based on the
order of RUN lines, so it should be typically a fixed-point, although it
may differ from the original input.
Example intended workflow:
# list of files for which there are conflicting prefixes to fix
$ FILES=""
# annotate each RUN line with a single separate check
$ llvm/utils/merge_check_prefixes.py --split $FILES
# run a tool to update each check
$ build/bin/llvm-lit --update-tests -sv $FILES
$ llvm/utils/update_any_test_checks.py $FILES
# re-compute compact diff
llvm/utils/merge_check_prefixes.py $FILES
Usage:
merge_check_prefixes.py [--base-prefix BASE] [--dry-run] [--split] file [file ...]
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Some questions to check: currently this recomputes the merge score using
len(unified_diff) and repeats until all sequences are merged. This often
results in a binary-tree merge, and can be a bit slow initially. We
could perhaps instead define that the pair-wise cost of the merged
sequence equals the smaller of the costs of the original elements of the
combined pair. We could also set a threshold below which we are willing
to repeat identical lines (to avoid making as many prefix check sets).
But currently neither seemed necessary in my experiments.
This script was produced by claude with the following prompt, and a lot
of post-processing:
Make a script to post-process the CHECK lines in this file looking for
common sequences: clang/test/OpenMP/target_map_codegen_21.cpp. Write it
in python and put it in llvm/utils near the update_test_check.py files.
Have the script create a NxM matrix of the score of common lines between
each CHECK line group. Read in each file reusing logic from
update_test_checks.py to find the current FileCheck markers being used.
Then filter the file for the set of checks corresponding to each RUN
FileCheck line, stripping the CHECK prefix, keeping only the suffixes
like -DAG and -NEXT. Then compute a difference score based on the line
count: `wc -l `diff -u a b``. Take the lowest scoring pair, and zipper
merge them together like so: `: common; -NEXT: in1; -NEXT: in2; -NEXT
common`. Remove those two pairs from the score matrix, then replace it
with a new entry that takes the minimum of each old entry. Repeat the
process until only one entry remains. Then iterate over the final result
and compute the set of each FileCheck RUN line that uses that line, and
assign a check prefix name to each unique group of FileCheck line uses.
Prepend that computed CHECK prefix name to each line. Update each
FileCheck RUN line prefix list with the set of applicable prefixes.
Filter out all old CHECK lines in the file, and write the new CHECK
lines to the file.
562c5c2 to
005df94
Compare
|
@llvm/pr-subscribers-testing-tools Author: Jameson Nash (vtjnash) ChangesThis script is intended to help improve the quality (specifically, the reviewability) of auto-generated test lines after any of the update_any_test_checks scripts. Context: I was working on updating OpenMP tests to remove a large number of all-zero-GEP, but the tests currently heavily rely on looking for those all-zero GEP to verify the output is correct. But simply running them in update_cc_test_checks.py makes them substantially larger, and also ignores a substantial fraction of key lines due to conflicting prefixes. Hence the motivation to design this script, which helps pre- and post- process the files for UTC. For the files I looked at, savings of over 80% lines changed were not uncommon (due to having over 100 RUN lines in them, and which had been hand-edited to optimize the prefixes list and the set of lines being checked). The strategy: given a test file with N RUN lines, this script finds common subsequences between those check blocks and merges them into a single block with fewer, semantically-shared prefixes, often dramatically reducing file size. This is intended to enable auto-fixing tests where update_any_test_checks.py currently complains of conflicting RUN lines (since it only looks at the last --check-prefix given), by recomputing a preferred new set. With the --split argument, this instead undoes the transform, expanding the file checks into a form that update_any_test_checks.py can handle, while leaving behind some metadata so that re-doing the transform generally assigns stable names. Order of output lines is based on the order of RUN lines, so it should be typically a fixed-point, although it may differ from the original input. Example intended workflow: Usage: For a substantial example, I made a branch with a sample change that needed this script (#181845) and then commit the results of running the script on a sample of the tests which needed updates in various different ways: main...vtjnash:llvm-project:jn/merge_check_prefixes-example. The last time this test code needed to be updated, 50k check lines changed into 1000k (https://reviews.llvm.org/D101849), but here there is about a 4x reduction in that check line count. Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Some questions (for later): currently this recomputes the merge score using len(unified_diff) and repeats until all sequences are merged. This often results in a binary-tree merge, and can be a bit slow initially. We could probably instead define that the pair-wise cost of the merged sequence equals the smaller of the costs of the original elements of the combined pair. This could set a threshold below which we are willing to repeat identical lines (to avoid making as many prefix check sets). This script was initially produced by Claude with the following prompt, and then a lot of human-assisted-AI post-processing: Patch is 33.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/182546.diff 3 Files Affected:
diff --git a/llvm/utils/merge_check_prefixes.py b/llvm/utils/merge_check_prefixes.py
new file mode 100755
index 0000000000000..664e857cfb2ab
--- /dev/null
+++ b/llvm/utils/merge_check_prefixes.py
@@ -0,0 +1,780 @@
+#!/usr/bin/env python3
+"""merge_check_prefixes.py – Merge redundant FileCheck prefixes.
+
+Given a test file with N RUN lines, this script finds common subsequences between
+those check blocks and merges them into a single block with fewer, semantically-
+shared prefixes, often dramatically reducing file size. This is intended to
+enable auto-fixing code where update_any_test_checks.py complains of conflicting
+RUN lines, by recomputing a preferred new set.
+
+With --split, this undoes the transform, expanding the file into a form that
+update_any_test_checks.py can handle, while leaving behind some metadata so that
+re-doing the transform generally assigns stable names.
+
+Example intended workflow:
+
+ # list of files for which there are conflicting prefixes to fix
+ $ FILES=""
+
+ # annotate each RUN line with a single separate check
+ $ llvm/utils/merge_check_prefixes.py --split $FILES
+
+ # run a tool to update each check
+ $ build/bin/llvm-lit --update-tests -sv $FILES
+ $ llvm/utils/update_any_test_checks.py $FILES
+
+ # re-compute compact diff
+ llvm/utils/merge_check_prefixes.py $FILES
+
+Usage:
+ merge_check_prefixes.py [--base-prefix BASE] [--dry-run] [--split] file [file ...]
+"""
+
+import argparse
+import difflib
+import os
+import re
+import sys
+
+# ---------------------------------------------------------------------------
+# Reuse helpers from UpdateTestChecks/common.py
+# ---------------------------------------------------------------------------
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "UpdateTestChecks"))
+import common # noqa: E402 (after sys.path manipulation)
+
+# ---------------------------------------------------------------------------
+# Regex helpers
+# ---------------------------------------------------------------------------
+
+# Match check-prefix / check-prefixes flags in RUN lines.
+_CHECK_PREFIX_FLAG_RE = re.compile(r"\s+(--?check-prefix(?:es)?[= ])(\S+)")
+
+# Matches plain comment lines that act as separators (e.g. "//" or "//.").
+# suffix=None is used as a sentinel for these in check-line tuples.
+_PLAIN_COMMENT_RE = re.compile(r"^\s*(?://|;|#)\.?\s*$")
+
+# Matches coverage comment lines written by --split mode, e.g.:
+# ; --check-prefix=CHECK-A=1, 2, 5
+_COVERAGE_COMMENT_RE = re.compile(
+ r"^\s*(?://|[;#])\s*--check-prefix=(\S+)=\s*([\d,\s]+)$"
+)
+
+# ---------------------------------------------------------------------------
+# Parsing
+# ---------------------------------------------------------------------------
+
+
+def parse_coverage_comments(lines):
+ """Re-compute prefix_name_map from coverage comment lines embedded in a file.
+
+ Scans *lines* for lines matching the pattern written by --split mode:
+ <comment> --check-prefix=NAME=idx, idx, ...
+ Returns a dict mapping frozenset(rl_indices) -> prefix_name, or None if no
+ such lines are found.
+ """
+ prefix_name_map = {}
+ for line in lines:
+ m = _COVERAGE_COMMENT_RE.match(line)
+ if m:
+ name = m.group(1)
+ indices = frozenset(int(s) for s in m.group(2).split(",") if s.strip())
+ prefix_name_map[indices] = name
+ return prefix_name_map or None
+
+
+def collect_check_lines_for_run(lines, prefixes):
+ """Return (check_lines, line_numbers) for all *prefixes* of one RUN line.
+
+ Iterates the file once in order. For each line, checks it against every
+ prefix regex; if it matches, records it as a check line. Plain comment
+ lines (e.g. "//" or "//.") that immediately follow a check line are also
+ included.
+
+ check_lines : list of (content, is_plain)
+ content is "suffix: body" for check lines, raw text for plain
+ is_plain is False for check lines, True for plain comments
+ line_numbers: set of 0-based indices that belong to this run's check block
+ """
+ prefix_set = frozenset(prefixes)
+
+ check_lines = []
+ line_numbers = set()
+ after_check = False # True while we may still consume trailing plain comments
+
+ for i, line in enumerate(lines):
+ m = common.CHECK_RE.match(line)
+ matched = m is not None and m.group(1) in prefix_set
+ if matched:
+ content = line[m.end(1) :]
+ check_lines.append((content, False, i))
+ line_numbers.add(i)
+ after_check = True
+ if not matched:
+ if after_check and _PLAIN_COMMENT_RE.match(line):
+ check_lines.append((line, True, i))
+ line_numbers.add(i)
+ else:
+ after_check = False
+
+ return check_lines, line_numbers
+
+
+def find_run_lines(test, lines):
+ """Local copy of common.find_run_lines that also returns per-RUN line ranges.
+
+ Returns (run_lines, run_line_ranges) where run_line_ranges is a list of
+ range objects giving the 0-based line indices spanned by each RUN statement
+ (accounting for backslash continuations).
+ """
+ common.debug("Scanning for RUN lines in test file:", test)
+ raw = [
+ (i, m.group(1))
+ for i, l in enumerate(lines)
+ for m in (common.RUN_LINE_RE.match(l),)
+ if m
+ ]
+ if not raw:
+ return [], []
+
+ run_lines = []
+ run_line_ranges = []
+ start_idx, content = raw[0]
+ end_idx = start_idx
+ for i, line_content in raw[1:]:
+ if content.endswith("\\"):
+ content = content.rstrip("\\") + " " + line_content
+ end_idx = i
+ else:
+ run_lines.append(content)
+ run_line_ranges.append(range(start_idx, end_idx + 1))
+ start_idx = end_idx = i
+ content = line_content
+ run_lines.append(content)
+ run_line_ranges.append(range(start_idx, end_idx + 1))
+
+ common.debug("Found {} RUN lines in {}:".format(len(run_lines), test))
+ for l in run_lines:
+ common.debug(" RUN: {}".format(l))
+ return run_lines, run_line_ranges
+
+
+def longest_common_prefix(strings):
+ if not strings:
+ return ""
+ s = strings[0]
+ for t in strings[1:]:
+ while not t.startswith(s):
+ s = s[:-1]
+ if not s:
+ return ""
+ return s
+
+
+def auto_base_prefix(prefixes):
+ """Detect a common leading string from all prefix names, e.g. 'CHECK'."""
+ base = longest_common_prefix(list(prefixes))
+ # Strip any trailing suffixes
+ base = base.rstrip("0123456789-")
+ return base or "CHECK"
+
+
+# ---------------------------------------------------------------------------
+# Score / distance
+# ---------------------------------------------------------------------------
+
+
+def _seq_to_strs(seq):
+ return [(";" if c else "*") + content for content, c, _ in seq]
+
+
+def score(seq_a, seq_b):
+ """Number of diff lines (unified diff hunk lines) between two sequences."""
+ a = _seq_to_strs(seq_a)
+ b = _seq_to_strs(seq_b)
+ return sum(1 for _ in difflib.unified_diff(a, b))
+
+
+# ---------------------------------------------------------------------------
+# Zipper merge
+# ---------------------------------------------------------------------------
+
+
+def zipper_merge(seq_a, seq_b):
+ """Merge two check-line sequences using SequenceMatcher.
+
+ Returns a new sequence of (suffix, content, present_in) triples where
+ *present_in* is the union of the contributing sequences' present_in sets.
+ Equal blocks are emitted once with a unioned present_in; A-only lines
+ keep A's present_in; B-only lines keep B's present_in. Within each
+ "gap" between equal blocks, A-only lines come before B-only lines so
+ that the logical order is: common, then A variant, then B variant.
+ """
+ a_strs = _seq_to_strs(seq_a)
+ b_strs = _seq_to_strs(seq_b)
+ sm = difflib.SequenceMatcher(None, a_strs, b_strs, autojunk=False)
+ result = []
+ for tag, i1, i2, j1, j2 in sm.get_opcodes():
+ if tag == "equal":
+ for ai, bi in zip(range(i1, i2), range(j1, j2)):
+ ca, ia, pa = seq_a[ai]
+ cb, ib, pb = seq_b[bi]
+ result.append((ca, ia, pa | pb))
+ assert ca == cb
+ assert ia == ib
+ elif tag == "replace":
+ for k in range(i1, i2):
+ result.append(seq_a[k])
+ for k in range(j1, j2):
+ result.append(seq_b[k])
+ elif tag == "delete":
+ for k in range(i1, i2):
+ result.append(seq_a[k])
+ elif tag == "insert":
+ for k in range(j1, j2):
+ result.append(seq_b[k])
+ return result
+
+
+# ---------------------------------------------------------------------------
+# Merge strategies
+# ---------------------------------------------------------------------------
+
+
+def concat(items):
+ """Concatenate all sequences in order without merging."""
+ result = []
+ for seq in items:
+ result.extend(seq)
+ return result
+
+
+# ---------------------------------------------------------------------------
+# Hierarchical (single-linkage) merge
+# ---------------------------------------------------------------------------
+
+
+def hierarchical_merge(items, label):
+ """Single-linkage agglomerative merge of all sequences.
+
+ *seqs* is a list of sequences; each line already carries its own pset.
+ Returns a single merged sequence.
+ """
+ total = len(items) - 1
+ step = 0
+ prefix = f"{label} " if label else ""
+
+ # Pre-compute all pairwise scores before any merging.
+ # sc[(i, j)] with i < j holds the score between items[i] and items[j].
+ # Indices are stable: merged results are appended to items with fresh indices.
+ sc = {}
+ n = len(items)
+ active = list(range(n))
+ for i in range(n):
+ for j in range(i + 1, n):
+ sc[i, j] = score(items[i], items[j])
+
+ while len(active) > 1:
+ # Find the pair with the lowest score; break ties by (i, j) order.
+ best_score = None
+ best_i = best_j = -1
+ for (i, j), s in sc.items():
+ if (
+ best_score is None
+ or s < best_score
+ or (s == best_score and (i, j) < (best_i, best_j))
+ ):
+ best_score = s
+ best_i, best_j = i, j
+
+ merged_seq = zipper_merge(items[best_i], items[best_j])
+ new_idx = len(items)
+ items.append(merged_seq)
+
+ # Remove consumed indices so the loops below visit only survivors.
+ active.remove(best_i)
+ active.remove(best_j)
+
+ # Score the merged item against each remaining active item: take the min
+ # of the two inherited scores and the actual diff of the new merged text.
+ for k in active:
+ si = sc[min(k, best_i), max(k, best_i)]
+ sj = sc[min(k, best_j), max(k, best_j)]
+ sc[k, new_idx] = min(si, sj, score(merged_seq, items[k]))
+
+ # Drop all entries that reference the now-consumed indices.
+ for k in active:
+ sc.pop((min(k, best_i), max(k, best_i)))
+ sc.pop((min(k, best_j), max(k, best_j)))
+ sc.pop((best_i, best_j))
+
+ active.append(new_idx)
+
+ step += 1
+ common.debug(
+ f" {prefix}merge {step}/{total} (score {best_score})", end="\r", flush=True
+ )
+ common.debug()
+
+ return items[-1]
+
+
+# ---------------------------------------------------------------------------
+# Prefix name assignment
+# ---------------------------------------------------------------------------
+
+
+def human_sort_key(s):
+ """Sort key for alphanumeric strings, sorting digit runs by numeric value.
+
+ E.g. "foo10" sorts after "foo9" rather than before it.
+ """
+ return [(0, int(p)) if p.isdigit() else (1, p) for p in re.split(r"(\d+)", s) if p]
+
+
+def assign_prefix_names(
+ block_merges,
+ all_rls_frozenset,
+ base,
+ rl_coverage,
+ run_prefix_map,
+ all_prefixes,
+ split,
+):
+ """Map each unique present_in set found in *block_merges* to a new prefix name.
+
+ Each presence set is a frozenset of run-line strings. *rl_coverage* maps
+ frozenset(run_lines) → original_prefix_name for reuse when possible.
+ *run_prefix_map* supplies the original prefix names used by each run line,
+ for building generated names.
+ """
+ presence_sets = set()
+ for seq in block_merges:
+ for _content, _is_plain, pset in seq:
+ presence_sets.add(pset)
+
+ # Generate or reuse a new name from the distinguishing suffixes of the prefix
+ # names used by the run lines in this set.
+ def _pset_name(pset):
+ if pset == all_rls_frozenset:
+ return base
+ existing = rl_coverage.get(pset)
+ if existing is not None:
+ return existing
+ if split:
+ # Create the name by combining all prefixes
+ def _rl_suffixes_split(rl):
+ """Extract ordered union of all parts from all prefixes for this run line."""
+ result = {}
+ for p in run_prefix_map[rl]:
+ result.update(dict.fromkeys(p.split("-")))
+ result.pop("", None)
+ if not result or (len(result) == 1 and base in result):
+ # if the only key is CHECK, convert to CHECK-1
+ result = dict.fromkeys([base, str(rl + 1)])
+ return result
+
+ suffix_sets = [_rl_suffixes_split(rl) for rl in sorted(pset)]
+ # collect all unique suffixes in an ordered set
+ all_unique = {}
+ for s in suffix_sets:
+ all_unique.update(s)
+ keys = []
+ # move base to the head, if present
+ if base in all_unique:
+ all_unique.pop(base)
+ keys.append(base)
+ keys.extend(all_unique.keys())
+ candidate = "-".join(keys)
+ else:
+ # Create the name by finding any common part in each run line
+ # and appending them together with other lines
+ def _rl_suffixes_merge(rl):
+ """Compute intersection of all parts across all prefixes for this run line."""
+ suffix_sets = []
+ for p in run_prefix_map[rl]:
+ suffix_sets.append(dict.fromkeys(p.split("-")))
+ result = {}
+ if suffix_sets:
+ result = suffix_sets[0]
+ for s in suffix_sets[1:]:
+ result = dict.fromkeys(k for k in result if k in s)
+ result.pop("", None)
+ if not result or (len(result) == 1 and base in result):
+ # if the only key is CHECK, convert to CHECK-1
+ result = dict.fromkeys([base, str(rl + 1)])
+ return result
+
+ keys = {}
+ for s in (_rl_suffixes_merge(rl) for rl in sorted(pset)):
+ keys.update(s)
+ candidate = []
+ # move base to the head, if present
+ if base in keys:
+ keys.pop(base)
+ candidate.append(base)
+ candidate.extend(keys.keys())
+ candidate = "-".join(candidate)
+ name, n = candidate, 2
+ while name in all_prefixes:
+ name = candidate + "-_" + str(n)
+ n += 1
+ all_prefixes.add(name)
+ return name
+
+ return {pset: _pset_name(pset) for pset in presence_sets}
+
+
+# ---------------------------------------------------------------------------
+# RUN line rewriting
+# ---------------------------------------------------------------------------
+
+
+def new_prefixes_for_run(rl, prefix_name_map):
+ """Return the sorted list of new prefix names that run line *rl* needs.
+
+ A run line needs every new prefix whose presence set contains *rl*.
+ """
+ return sorted(name for pset, name in prefix_name_map.items() if rl in pset)
+
+
+def rewrite_run_line(run_line, new_names, any_replaced=False):
+ """Replace all --check-prefix(es) flags in *run_line* with the new list."""
+ if len(new_names) == 1:
+ replacement_flag = " --check-prefix=" + new_names[0]
+ else:
+ replacement_flag = " --check-prefixes=" + ",".join(new_names)
+
+ # Replace the first occurrence; drop any subsequent check-prefix flags
+ replaced = any_replaced
+
+ is_default = replacement_flag == " --check-prefix=CHECK"
+
+ def _replace(m):
+ nonlocal replaced
+ if not replaced:
+ replaced = True
+ return "" if is_default else replacement_flag
+ return ""
+
+ new_line = _CHECK_PREFIX_FLAG_RE.sub(_replace, run_line)
+ return new_line, replaced
+
+
+# ---------------------------------------------------------------------------
+# File rewrite
+# ---------------------------------------------------------------------------
+
+
+def comment_char(lines):
+ """Detect the comment character used in check lines."""
+ for line in lines:
+ stripped = line.lstrip()
+ if stripped.startswith("//"):
+ return "//"
+ if stripped.startswith(";"):
+ return ";"
+ if stripped.startswith("#"):
+ return "#"
+ return "//"
+
+
+def emit_check_block(merged_seq, prefix_name_map, comment):
+ """Render the merged check block as a list of text lines (with newlines)."""
+ out = []
+ for content, is_plain, pset in merged_seq:
+ if is_plain:
+ out.append(content)
+ else:
+ name = prefix_name_map[pset]
+ out.append(
+ "{comment} {name}{content}".format(
+ comment=comment, name=name, content=content
+ )
+ )
+ return out
+
+
+# ---------------------------------------------------------------------------
+# Main processing of a single file
+# ---------------------------------------------------------------------------
+
+
+def process_lines(lines, path, base_prefix_override=None, split=False):
+ """Process *lines* (as returned by readlines()) for *path* and return the
+ rewritten line list, or None if the file should be skipped."""
+
+ # 1. Find RUN lines and extract FileCheck prefixes
+ run_lines_raw, run_line_ranges = find_run_lines(path, lines)
+ _filtered = []
+ _filtered_ranges = []
+ for rl, rl_range in zip(run_lines_raw, run_line_ranges):
+ if "|" not in rl:
+ common.warn("Skipping unparsable RUN line: " + rl)
+ else:
+ _filtered.append(rl)
+ _filtered_ranges.append(rl_range)
+ run_lines_raw = _filtered
+ run_line_ranges = _filtered_ranges
+ if not run_lines_raw:
+ common.warn(f"{path}: no RUN lines found, skipping.")
+ return None
+
+ # Map each run line to its FileCheck prefix(es)
+ run_prefix_list = [] # ordered unique run-line strings
+ _rl_to_idx = {} # run_line_str -> int index into run_prefix_list
+ run_prefix_map = [] # list of prefix lists, indexed by rl_idx
+ all_prefixes = set()
+ for rl in run_lines_raw:
+ _rl_to_idx[rl] = len(run_prefix_list)
+ run_prefix_list.append(rl)
+ prefixes = common.get_check_prefixes(rl)
+ run_prefix_map.append(prefixes)
+ all_prefixes.update(prefixes)
+
+ if not all_prefixes:
+ common.warn(f"{path}: no FileCheck prefixes found, skipping.")
+ return None
+
+ # 2. Collect check lines for each RUN line
+ _seq_lnums = [collect_check_lines_for_run(lines, pfxs) for pfxs in run_prefix_map]
+ run_seqs = [seq for seq, _ in _seq_lnums]
+ check_line_indices = set().union(*(lnums for _, lnums in _seq_lnums))
+ check_line_blocks = []
+ for idx in range(len(lines)):
+ if idx in check_line_indices:
+ if check_line_blocks and check_line_blocks[-1].stop == idx:
+ check_line_blocks[-1] = range(check_line_blocks[-1].start, idx + 1)
+ else:
+ check_line_blocks.append(range(idx, idx + 1))
+ erase_line_indices =...
[truncated]
|
|
A simpler feature we could do with is a warning when 2 RUN lines emit the same codegen for a test but don't share a common check prefix to allow them to be merged. |
|
That seems reasonable, but entirely different from the purpose of this PR, which is to just fix those things that cause warnings and incorrect UTC output. But that is intended to be easily possible to implement as an option in this PR. The hierarchical_merge is set up to support having a merge cost threshold for whether to precisely merge or simply concat. Setting it to zero would then only combine exactly identical check sequences. |
|
Since there aren't any test cases (should there be?), I'll ask to be sure: does this do structural diffing for captured variables, so that they can be merged even if the names differ? To make what I'm asking more concrete, here's an example: and the second RUN line emits these checks: The actual emitted IR from both RUN lines, even if slightly different, would still match either set of checks. Would this be able to merge them? Separately: should we update the auto-updater in lit so that it automatically invokes the merge script when updating UTC tests? |
Happy to add them. The other scripts didn't have any, so I didn't add them initially.
It doesn't, since I hadn't noticed that case myself. The expectation is that UTC is supposed to already handle that already. This presently relies on the UTC script making stable choices, to reduce–though not necessarily eliminate–the need for human cleanup afterwards. My initial reaction is that we could fix that example with help from UTC–in conjunction with this current script–by having UTC look at the other nearby CHECK prefixes when picking names for CHECK_B. I believe it already looks for names to re-use, so that would be an expansion of that particular logic. Notably too, this merge script is already computing what "nearby" means (defining it as a unified diff), so possibly UTC could use this script iteratively: generate a reference tentative merge before each RUN line is processed to use that as a source of names for the next RUN line. At the end, the names will have been picked based on the other nearby options, so the final merge should be more likely to fold entirely.
That's already done here (the |
|
I also want to cross-link that this PR has an RFC comment thread now too: https://discourse.llvm.org/t/rfc-use-a-unified-diff-strategy-to-solve-the-conflicting-filecheck-prefix-issue-in-update-test-checks/90434 |
This script is intended to help improve the quality (specifically, the reviewability) of auto-generated test lines after any of the update_any_test_checks scripts.
Context: I was working on updating OpenMP tests to remove a large number of all-zero-GEP, but the tests currently heavily rely on looking for those all-zero GEP to verify the output is correct. But simply running them in update_cc_test_checks.py makes them substantially larger, and also ignores a substantial fraction of key lines due to conflicting prefixes. Hence the motivation to design this script, which helps pre- and post- process the files for UTC. For the files I looked at, savings of over 80% lines changed were not uncommon (due to having over 100 RUN lines in them, and which had been hand-edited to optimize the prefixes list and the set of lines being checked).
The strategy: given a test file with N RUN lines, this script finds common subsequences between those check blocks and merges them into a single block with fewer, semantically-shared prefixes, often dramatically reducing file size. This is intended to enable auto-fixing tests where update_any_test_checks.py currently complains of conflicting RUN lines (since it only looks at the last --check-prefix given), by recomputing a preferred new set.
With the --split argument, this instead undoes the transform, expanding the file checks into a form that update_any_test_checks.py can handle, while leaving behind some metadata so that re-doing the transform generally assigns stable names. Order of output lines is based on the order of RUN lines, so it should be typically a fixed-point, although it may differ from the original input.
This is primarily intended to be used via the new
--mergeoption to the UTC scripts:Just 2 UTC scripts are updated currently, just to demonstrate this. Updating the others likewise is expected to be similar and simple, but I didn't want those to need to get mixed up in this review. Internally, that option is essentially equivalent to separately calling each tool:
This separation can be useful for cases where the user doesn't want to re-run the UTC script yet (e.g. because it changes a lot of other unrelated things, either due to past human meddling or tool updates) and wants to first do a NFC compact merge step, then do an NFC update step, and then presumably do a functional change afterwards too.
Usage:
For a substantial example, I made a branch with a sample change that needed this script (#181845) and then commit the results of running the script on a sample of the tests which needed updates in various different ways: main...vtjnash:llvm-project:jn/merge_check_prefixes-example. The last time this test code needed to be updated, 50k check lines changed into 1000k (https://reviews.llvm.org/D101849), but here there is about a 4x reduction in that check line count.
Co-authored-by: Claude Sonnet 4.5 noreply@anthropic.com
because I haven't written python in several years, so I forget the names of functions in the the stdlib, so it helps research those much more quickly than me.
A question (for later): This could set a threshold below which we are willing to repeat identical lines (to avoid making as many prefix check sets). I had initially expected this to be useful, so I designed the merge algorithm to use a cost function with this ability in mind. But I didn't see much use for that thresholding in my initial testing. Because all the outputs are generated from the same input, there's almost always a meaningful fraction of lines that will be shared.
This script was initially produced by Claude with the following prompt, and then a lot of human-assisted-AI post-processing:
Make a script to post-process the CHECK lines in this file looking for common sequences: clang/test/OpenMP/target_map_codegen_21.cpp. Write it in python and put it in llvm/utils near the update_test_check.py files. Have the script create a NxM matrix of the score of common lines between each CHECK line group. Read in each file reusing logic from update_test_checks.py to find the current FileCheck markers being used. Then filter the file for the set of checks corresponding to each RUN FileCheck line, stripping the CHECK prefix, keeping only the suffixes like -DAG and -NEXT. Then compute a difference score based on the line count:
wc -ldiff -u a b``. Take the lowest scoring pair, and zipper merge them together like so:: common; -NEXT: in1; -NEXT: in2; -NEXT common. Remove those two pairs from the score matrix, then replace it with a new entry that takes the minimum of each old entry. Repeat the process until only one entry remains. Then iterate over the final result and compute the set of each FileCheck RUN line that uses that line, and assign a check prefix name to each unique group of FileCheck line uses. Prepend that computed CHECK prefix name to each line. Update each FileCheck RUN line prefix list with the set of applicable prefixes. Filter out all old CHECK lines in the file, and write the new CHECK lines to the file.