-
-
Notifications
You must be signed in to change notification settings - Fork 593
dolt/dolthub#1430: Add --filter option for dolt diff
#10030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
dolt/dolthub#1430: Add --filter option for dolt diff
#10030
Conversation
--filter option for dolt diff
--filter option for dolt diff--filter option for dolt diff
096642a to
3bfa4de
Compare
Add Go unit tests for the diff filter feature to provide fast feedback and granular validation of filter logic. Test coverage includes: - diffTypeFilter struct validation (isValid method) - Filter inclusion methods (adds, drops, modifications) - Edge cases (empty strings, typos, case sensitivity) - Consistency checks across all filter types - Constant validation (values, uniqueness, lowercase) - Invalid filter behavior verification Tests added: - 12 test functions - 48+ individual test cases - 100% coverage of diffTypeFilter struct methods These tests complement the existing BATS integration tests and provide unit-level regression protection. Refs: dolthub#1430
Implement lazy table header initialization to fix a bug where empty table headers were printed when all rows were filtered out during data-only diffs. This occurred because BeginTable() was called before row filtering, causing headers to print even when no matching rows existed. The solution introduces a lazyRowWriter that delays the BeginTable() call until the first row is actually written. This wrapper is only used when: - A filter is active (added/modified/removed) - The diff is data-only (no schema changes or table renames) Implementation changes: - Add shouldUseLazyHeader() helper to determine when to use lazy initialization based on filter presence and diff type - Add lazyRowWriter type that wraps SqlRowDiffWriter and delays BeginTable() until first WriteRow() or WriteCombinedRow() call - Modify diffUserTable() to skip BeginTable when using lazy writer - Modify diffRows() to conditionally create lazyRowWriter vs normal rowWriter based on shouldUseLazyHeader() check - Add comprehensive unit tests for shouldUseLazyHeader logic and lazyRowWriter behavior (5 test functions, 8+ test cases) - Add mock implementations of diffWriter and SqlRowDiffWriter interfaces to enable testing without database dependencies - Fix BATS test assertions to match actual SQL output format (lowercase type names, MODIFY COLUMN vs DROP/ADD pattern) Test coverage: - TestShouldUseLazyHeader: validates lazy header logic conditions - TestLazyRowWriter_NoRowsWritten: verifies BeginTable not called when no rows written (core lazy behavior) - TestLazyRowWriter_RowsWritten: verifies BeginTable called on first write - TestLazyRowWriter_CombinedRowsWritten: tests combined row writes - TestLazyRowWriter_InitializedOnlyOnce: ensures BeginTable called exactly once Refs: dolthub#1430
Extract duplicated row filter checking logic into a reusable shouldSkipDiffType helper function. Refs: dolthub#1430
d6e1f73 to
6cd5e0b
Compare
|
I believe the two failing bats tests are unrelated to my changes. I believe a perms issue on my part? also the one test failing in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codeaucafe A couple of comments mainly on integrating your solution with our existing infrastructure under table_deltas.go. Let me know if you have any questions.
| ap.SupportsString(FormatFlag, "r", "result output format", "How to format diff output. Valid values are tabular, sql, json. Defaults to tabular.") | ||
| ap.SupportsString(WhereParam, "", "column", "filters columns based on values in the diff. See {{.EmphasisLeft}}dolt diff --help{{.EmphasisRight}} for details.") | ||
| ap.SupportsInt(LimitParam, "", "record_count", "limits to the first N diffs.") | ||
| ap.SupportsString("filter", "", "diff_type", "filters results based on the type of modification (added, modified, removed).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should follow the NameParam naming scheme
go/cmd/dolt/commands/diff.go
Outdated
| SQLDiffOutput diffOutput = 2 | ||
| JsonDiffOutput diffOutput = 3 | ||
|
|
||
| FilterAdds = "added" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the same post-tense wording, ie FilterAdded
go/cmd/dolt/commands/diff.go
Outdated
| if filterValue != "" && filterValue != FilterAdds && filterValue != FilterModified && filterValue != FilterRemoved && filterValue != NoFilter { | ||
| return errhand.BuildDError("invalid filter: %s. Valid values are: added, modified, removed", filterValue).Build() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the IsValid func here so we only have to update one place.
go/cmd/dolt/commands/diff.go
Outdated
| return errhand.BuildDError("invalid output format: %s", f).Build() | ||
| } | ||
|
|
||
| filterValue, _ := apr.GetValue("filter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the const that should be added to apr above here for these magic values.
go/cmd/dolt/commands/diff.go
Outdated
| displaySettings.limit, _ = apr.GetInt(cli.LimitParam) | ||
| displaySettings.where = apr.GetValueOrDefault(cli.WhereParam, "") | ||
|
|
||
| filterValue := apr.GetValueOrDefault("filter", "all") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this relates to magic apr vals
| // Apply table-level filtering based on diff type | ||
| if dArgs.filter != nil && dArgs.filter.filterBy != NoFilter { | ||
| shouldIncludeTable := false | ||
|
|
||
| // Check if table was added | ||
| if delta.IsAdd() && dArgs.filter.includeAddsOrAll() { | ||
| shouldIncludeTable = true | ||
| } | ||
|
|
||
| // Check if table was dropped | ||
| if delta.IsDrop() && dArgs.filter.includeDropsOrAll() { | ||
| shouldIncludeTable = true | ||
| } | ||
|
|
||
| // Check if table was modified (schema change or rename) | ||
| if !delta.IsAdd() && !delta.IsDrop() { | ||
| isModified := delta.SchemaChange || delta.IsRename() | ||
| if isModified && dArgs.filter.includeModificationsOrAll() { | ||
| shouldIncludeTable = true | ||
| } | ||
| // If no schema/rename changes but has data changes, let it through for row-level filtering | ||
| if !isModified && delta.DataChange { | ||
| shouldIncludeTable = true | ||
| } | ||
| } | ||
|
|
||
| if !shouldIncludeTable { | ||
| continue // Skip this table but continue processing others | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the TableDeltaSummary object provided. This may require rewriting your impl for the filter struct above, but it keeps the logic in one place. Specifically, there's a DiffType field you can make use of.
Secondly, I think this enables you to keep track of enabled filters in a map, ie str -> bool. This also keeps the diff type strings synchronized in the future. One func can handle the checks, instead of multiple include helper funcs. Of course, impl const vars to avoid the magic vals. You should define the consts in table_deltas.go since it already defines its own names for DiffType.
go/cmd/dolt/commands/diff.go
Outdated
| fromTableName: fromTableName, | ||
| toTableName: toTableName, | ||
| isAdd: isAdd, | ||
| isDrop: isDrop, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information should be in the delta object already, aka TableDeltaSummary. Also, you could alternatively store the beginTable since that's your main priority.
type lazyRowWriter struct {
realWriter diff.SqlRowDiffWriter
beginTable func() error
initialized bool
}
func (l *lazyRowWriter) WriteRow(ctx *sql.Context, row sql.Row, diffType diff.ChangeType, colDiffTypes []diff.ChangeType) error {
if !l.initialized {
if err := l.beginTable(); err != nil {
return err
}
l.initialized = true
}
return l.realWriter.WriteRow(ctx, row, diffType, colDiffTypes)
}
go/cmd/dolt/commands/diff.go
Outdated
| return errhand.VerboseErrorFromError(err) | ||
| var rowWriter diff.SqlRowDiffWriter | ||
| var err error | ||
| if shouldUseLazyHeader(dArgs, tableSummary) { | ||
| // Use lazy writer to delay BeginTable until first row write | ||
| rowWriter = newLazyRowWriter( | ||
| dw, | ||
| tableSummary.FromTableName.Name, | ||
| tableSummary.ToTableName.Name, | ||
| tableSummary.IsAdd(), | ||
| tableSummary.IsDrop(), | ||
| func() (diff.SqlRowDiffWriter, error) { | ||
| return dw.RowWriter(fromTableInfo, toTableInfo, tableSummary, unionSch) | ||
| }, | ||
| ) | ||
| } else { | ||
| // We always instantiate a RowWriter in case the diffWriter needs it to close off any work from schema output | ||
| rowWriter, err = dw.RowWriter(fromTableInfo, toTableInfo, tableSummary, unionSch) | ||
| if err != nil { | ||
| return errhand.VerboseErrorFromError(err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of providing two different branchs to initialize rowWriter differently, wrap the original with your wrapper:
rowWriter, err := dw.RowWriter(fromTableInfo, toTableInfo, tableSummary, unionSch)
if err != nil {
return errhand.VerboseErrorFromError(err)
}
if shouldUseLazyHeader(dArgs, tableSummary) {
rowWriter = &lazyRowWriter{
rowWriter,
func() error {
return dw.BeginTable(
tableSummary.FromTableName.Name,
tableSummary.ToTableName.Name,
tableSummary.IsAdd(),
tableSummary.IsDrop(),
)
},
}
}
go/cmd/dolt/commands/diff.go
Outdated
|
|
||
| // Apply row-level filtering based on diff type | ||
| if dArgs.filter != nil && dArgs.filter.filterBy != NoFilter { | ||
| if shouldSkipDiffType(dArgs.filter, oldRow.RowDiff) || shouldSkipDiffType(dArgs.filter, newRow.RowDiff) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes back to using the existing summary DiffType field and adding consts for the strings. Make sure to update the helper funcs. You could check the map for the necessary fields and flip using !.
|
Thanks @elianddb. Sorry for the delay. I have not been able to take a look at your comments yet due to having to work very late last couple days and helping with some family stuff. I'll be sure to review your comments and ask you any questions I have when I get time to work on addressing your comments this weekend. Thank you! Cheers, |
Constants and Naming: - Add FilterParam constant to flags.go following existing conventions - Move filter type constants (DiffTypeAdded, DiffTypeModified, DiffTypeRemoved, DiffTypeAll) to table_deltas.go for centralization - Update all references throughout codebase to use diff.DiffType* constants for consistency Validation Consolidation: - Consolidate duplicate validation logic into single isValid() method - Introduce newDiffTypeFilter() constructor for proper initialization - Remove redundant validation checks in parseDiffArgs function Map-Based Filter Architecture: - Refactor diffTypeFilter struct to use map[string]bool instead of string field for more extensible filtering - Replace three separate includeXOrAll() methods with single shouldInclude() method that performs map lookup - Update both table-level and row-level filtering to use unified shouldInclude() approach Row-Level Filtering with DiffType: - Add ChangeTypeToDiffType() helper function in table_deltas.go to convert row-level ChangeType enum to table-level DiffType strings - Refactor shouldSkipDiffType() to shouldSkipRow() using the new conversion helper and map-based filtering - Ensure consistent terminology between table and row filtering by using same DiffType string values throughout Refs: dolthub#1430
Refactor lazyRowWriter to use a cleaner callback-based architecture that reduces struct complexity from 8 fields to 2 fields. This addresses PR review feedback requesting simplification of the lazy header implementation. Changes: - Replace parameter storage with closure-based callback pattern that captures BeginTable parameters from outer scope - Eliminate separate ensureInitialized() method in favor of inline initialization check in WriteRow() and WriteCombinedRow() - Remove initialized bool field by using callback presence (nil check) to track initialization state - Always create RowWriter upfront and only delay BeginTable call, eliminating the need for writer factory function - Simplify Close() method to always delegate to wrapped writer Refs: dolthub#1430
Address @elianddb feedback on PR dolthub#10030 by standardizing DiffType usage, simplifying filter logic, and fixing row-level filtering. Changes: - Fix row filtering bug: add early return for ChangeType.None to prevent incorrectly skipping added/removed rows - Standardize DiffType: replace string literals with constants (DiffTypeAdded/Modified/Removed) in table_deltas.go and merge.go - Simplify table filtering: use delta.DiffType directly - Update tests: rewrite to use map-based architecture and newDiffTypeFilter() constructor Refs: dolthub#1430
Add DiffTypeRenamed constant and "removed" as user-friendly alias for "dropped" filter option. This provides more granular filtering and improves user experience with familiar terminology. List of changes: - Add DiffTypeRenamed constant for renamed tables - Revert GetSummary to use DiffTypeRenamed instead of treating renames as modified (table_deltas.go:742) - Add "removed" as alias that maps to "dropped" internally for user convenience - Update filter validation to include renamed option - Update CLI help text to document all filter options including renamed and removed alias - Handle renamed tables in merge stats alongside modified - Update getDiffSummariesBetweenRefs and getSchemaDiffSummariesBetweenRefs to handle renamed diff type - Update all tests to use new constants and test renamed filtering Filter options now available: added, modified, renamed, dropped, removed (alias for dropped). Refs: dolthub#1430
Add tests for the --filter=renamed option and the --filter=removed alias that maps to dropped. Go tests: - Tests for filter=renamed checking all diff types - Tests for "removed" alias mapping to dropped internally - Verify renamed filter only includes renamed tables BATS tests: - Test --filter=renamed with table rename scenario - Test --filter=dropped with table drop scenario - Verify --filter=removed alias works same as dropped - Verify other filters correctly exclude renamed/dropped tables Refs: dolthub#1430
…e/1430/add-filter-opt-for-diff
Update the short help text for the --filter parameter to document all valid filter options (added, modified, renamed, dropped) and mention that 'removed' is accepted as an alias for 'dropped'. Refs: dolthub#1430
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elianddb I updated the code based on your comments. Let me know if I misunderstood something and implemented it wrong.
I believe the failing bats test are unrelated and failing for same reasons those other unrelated bats tests failed for
Thank you!
| continue | ||
| } | ||
| if summary.DiffType == "added" { | ||
| if summary.DiffType == diff.DiffTypeAdded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, used the new diff type constants added here so we aren't used string literals here
| // lazyRowWriter wraps a SqlRowDiffWriter and delays calling BeginTable | ||
| // until the first row is actually written. This prevents empty table headers | ||
| // when all rows are filtered out. | ||
| type lazyRowWriter struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this looking @elianddb?
Also, instead of having initialized just added nil check on onFirstWrite
| To filter which data rows are displayed, use {{.EmphasisLeft}}--where <SQL expression>{{.EmphasisRight}}. Table column names in the filter expression must be prefixed with {{.EmphasisLeft}}from_{{.EmphasisRight}} or {{.EmphasisLeft}}to_{{.EmphasisRight}}, e.g. {{.EmphasisLeft}}to_COLUMN_NAME > 100{{.EmphasisRight}} or {{.EmphasisLeft}}from_COLUMN_NAME + to_COLUMN_NAME = 0{{.EmphasisRight}}. | ||
| To filter diff output by change type, use {{.EmphasisLeft}}--filter <type>{{.EmphasisRight}} where {{.EmphasisLeft}}<type>{{.EmphasisRight}} is one of {{.EmphasisLeft}}added{{.EmphasisRight}}, {{.EmphasisLeft}}modified{{.EmphasisRight}}, {{.EmphasisLeft}}renamed{{.EmphasisRight}}, or {{.EmphasisLeft}}dropped{{.EmphasisRight}}. The {{.EmphasisLeft}}added{{.EmphasisRight}} filter shows only additions (new tables or rows), {{.EmphasisLeft}}modified{{.EmphasisRight}} shows only schema modifications or row updates, {{.EmphasisLeft}}renamed{{.EmphasisRight}} shows only renamed tables, and {{.EmphasisLeft}}dropped{{.EmphasisRight}} shows only deletions (dropped tables or deleted rows). You can also use {{.EmphasisLeft}}removed{{.EmphasisRight}} as an alias for {{.EmphasisLeft}}dropped{{.EmphasisRight}}. For example, {{.EmphasisLeft}}dolt diff --filter=dropped{{.EmphasisRight}} shows only deleted rows and dropped tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to include renamed filter, as well as using removed for alias for dropped DiffTypeDropped
|
|
||
| // Map "removed" to "dropped" (alias for user convenience) | ||
| internalFilterType := filterType | ||
| if filterType == "removed" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed filter value alias for DiffTypeDropped. Figure for filtering removed is better terminology but shouldn't be specific diff type value.
Summary
TL;DR
There was no action on the original #3499 for issue #1430; the PR was closed ~3 years ago. This PR fixes the open PR comments and updates the implementation details a bit for the RowWriting of filtered rows
Description
Adds
--filteroption todolt difffor filtering by change type (added,modified,renamed,dropped, orremovedas alias fordropped); please note I added renamed and dropped alias to this filter list. Completes PR #3499 by fixing the issues that blocked it from merging, plus fixes an additional bug where empty table headers wereprinted when all rows were filtered out.
Users reviewing large diffs often need to focus on specific change types - deletes may need extra scrutiny while inserts are routine. With diffs spanning thousands of rows across multiple tables, grep isn't enough since updates show
both additions and deletions.
Implementation
Core changes:
diffTypeFilterstruct with validation methods indiff.go--filterargument inarg_parser_helpers.godelta.SchemaChangeanddelta.IsRename()for modification detectiondiff.ChangeTypeinwriteDiffResults()continueinstead ofreturn nilso all tables get processedEmpty header bug fix:
lazyRowWriterthat delaysBeginTable()until first row writeshouldUseLazyHeader()helper to determine when lazy initialization is neededTests:
go/cmd/dolt/commands/diff_filter_test.go- comprehensive unit tests:TestShouldUseLazyHeader- validates lazy header logic (8 test cases)TestLazyRowWriter_*- tests lazy initialization behavior (4 test functions)diffWriterandSqlRowDiffWriterinterfacesWhat Was Broken in PR #3499
The original PR had three blocking issues from code review comments that I fixed here:
No row-level filtering - only filtered tables, not individual rows
writeDiffResults()Incomplete modification detection - didn't check for schema changes properly
delta.SchemaChange || delta.IsRename()as requestedLoop exits early -
return nilprevented processing remaining tablescontinueinsteadAdditional Bug (bug from my initial filter changes; not existing prior) Fixed
Empty table headers when filtering:
When using
--filterwith data-only changes, if all rows were filtered out, dolt would still print the table diff header with no content:This occurred because
BeginTable()was called before row filtering. ThelazyRowWriterpattern delays header output until we confirm at least one row will be written.Usage/Examples
Fixes #1430