Skip to content

Conversation

@codeaucafe
Copy link
Contributor

@codeaucafe codeaucafe commented Nov 5, 2025

Summary

TL;DR

There was no action on the original #3499 for issue #1430; the PR was closed ~3 years ago. This PR fixes the open PR comments and updates the implementation details a bit for the RowWriting of filtered rows

Description

Adds --filter option to dolt diff for filtering by change type (added, modified, renamed, dropped, or removed as alias for dropped); please note I added renamed and dropped alias to this filter list. Completes PR #3499 by fixing the issues that blocked it from merging, plus fixes an additional bug where empty table headers were
printed when all rows were filtered out.

Users reviewing large diffs often need to focus on specific change types - deletes may need extra scrutiny while inserts are routine. With diffs spanning thousands of rows across multiple tables, grep isn't enough since updates show
both additions and deletions.

Implementation

Core changes:

  • Added diffTypeFilter struct with validation methods in diff.go
  • Added --filter argument in arg_parser_helpers.go
  • Table filtering uses delta.SchemaChange and delta.IsRename() for modification detection
  • Row filtering checks diff.ChangeType in writeDiffResults()
  • Fixed control flow - uses continue instead of return nil so all tables get processed

Empty header bug fix:

  • Implemented lazyRowWriter that delays BeginTable() until first row write
  • Added shouldUseLazyHeader() helper to determine when lazy initialization is needed
  • Only applies to filtered data-only diffs (no schema changes or renames)
  • Prevents empty table headers from printing when all rows are filtered out

Tests:

  • go/cmd/dolt/commands/diff_filter_test.go - comprehensive unit tests:
    • TestShouldUseLazyHeader - validates lazy header logic (8 test cases)
    • TestLazyRowWriter_* - tests lazy initialization behavior (4 test functions)
    • Mock implementations of diffWriter and SqlRowDiffWriter interfaces
  • BATS integration tests for all filter types with tables, rows, and schema changes
  • Edge cases: invalid values, typos, case sensitivity

What Was Broken in PR #3499

The original PR had three blocking issues from code review comments that I fixed here:

  1. No row-level filtering - only filtered tables, not individual rows

    • Fixed by adding filtering in writeDiffResults()
  2. Incomplete modification detection - didn't check for schema changes properly

    • Fixed by using delta.SchemaChange || delta.IsRename() as requested
  3. Loop exits early - return nil prevented processing remaining tables

    • Fixed by using continue instead

Additional Bug (bug from my initial filter changes; not existing prior) Fixed

Empty table headers when filtering:
When using --filter with data-only changes, if all rows were filtered out, dolt would still print the table diff header with no content:

# Before fix:
$ dolt diff HEAD~1 --filter=modified  # (table has only added rows)
diff --dolt a/t b/t
--- a/t
+++ b/t
(empty - no rows printed)

# After fix:
$ dolt diff HEAD~1 --filter=modified
(completely empty - no output at all)

This occurred because BeginTable() was called before row filtering. The lazyRowWriter pattern delays header output until we confirm at least one row will be written.

Usage/Examples

dolt diff --filter=added      # new tables/rows
dolt diff --filter=modified   # schema changes, row updates
dolt diff --filter=renamed    # renamed tables
dolt diff --filter=dropped    # dropped tables, deleted rows
dolt diff --filter=removed    # alias for dropped
dolt diff HEAD~1 --filter=dropped -r sql

Fixes #1430

@codeaucafe codeaucafe changed the title test(diff): add comprehensive unit tests for filter functionality feat: add --filter option for dolt diff Nov 5, 2025
@codeaucafe codeaucafe changed the title feat: add --filter option for dolt diff dolt/dolthub#1430: Add --filter option for dolt diff Nov 5, 2025
@codeaucafe codeaucafe force-pushed the codeaucafe/1430/add-filter-opt-for-diff branch 4 times, most recently from 096642a to 3bfa4de Compare November 11, 2025 01:06
Add Go unit tests for the diff filter feature to provide fast
feedback and granular validation of filter logic.

Test coverage includes:
- diffTypeFilter struct validation (isValid method)
- Filter inclusion methods (adds, drops, modifications)
- Edge cases (empty strings, typos, case sensitivity)
- Consistency checks across all filter types
- Constant validation (values, uniqueness, lowercase)
- Invalid filter behavior verification

Tests added:
- 12 test functions
- 48+ individual test cases
- 100% coverage of diffTypeFilter struct methods

These tests complement the existing BATS integration tests
and provide unit-level regression protection.

Refs: dolthub#1430
Implement lazy table header initialization to fix a bug where
empty table headers were printed when all rows were filtered out
during data-only diffs. This occurred because BeginTable() was
called before row filtering, causing headers to print even when
no matching rows existed.

The solution introduces a lazyRowWriter that delays the BeginTable()
call until the first row is actually written. This wrapper is only
used when:
- A filter is active (added/modified/removed)
- The diff is data-only (no schema changes or table renames)

Implementation changes:
- Add shouldUseLazyHeader() helper to determine when to use lazy
initialization based on filter presence and diff type
- Add lazyRowWriter type that wraps SqlRowDiffWriter and delays
BeginTable() until first WriteRow() or WriteCombinedRow() call
- Modify diffUserTable() to skip BeginTable when using lazy writer
- Modify diffRows() to conditionally create lazyRowWriter vs normal
rowWriter based on shouldUseLazyHeader() check
- Add comprehensive unit tests for shouldUseLazyHeader logic and
lazyRowWriter behavior (5 test functions, 8+ test cases)
- Add mock implementations of diffWriter and SqlRowDiffWriter
interfaces to enable testing without database dependencies
- Fix BATS test assertions to match actual SQL output format
(lowercase type names, MODIFY COLUMN vs DROP/ADD pattern)

Test coverage:
- TestShouldUseLazyHeader: validates lazy header logic conditions
- TestLazyRowWriter_NoRowsWritten: verifies BeginTable not called
when no rows written (core lazy behavior)
- TestLazyRowWriter_RowsWritten: verifies BeginTable called on
first write
- TestLazyRowWriter_CombinedRowsWritten: tests combined row writes
- TestLazyRowWriter_InitializedOnlyOnce: ensures BeginTable called
exactly once

Refs: dolthub#1430
Extract duplicated row filter checking logic into a reusable
shouldSkipDiffType helper function.

Refs: dolthub#1430
@codeaucafe codeaucafe force-pushed the codeaucafe/1430/add-filter-opt-for-diff branch from d6e1f73 to 6cd5e0b Compare November 11, 2025 01:08
@codeaucafe codeaucafe marked this pull request as ready for review November 11, 2025 01:38
@codeaucafe
Copy link
Contributor Author

codeaucafe commented Nov 11, 2025

I believe the two failing bats tests are unrelated to my changes. I believe a perms issue on my part?

also the one test failing in Test Go/Go tests (ubuntu-22.04) is for TestMergeConcurrency which fails for ubuntu with race detected error but passes for mac and windows. Is this a known flaky test? If its not I can look into this later when I get back from short vacation this week

Copy link
Contributor

@elianddb elianddb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codeaucafe A couple of comments mainly on integrating your solution with our existing infrastructure under table_deltas.go. Let me know if you have any questions.

ap.SupportsString(FormatFlag, "r", "result output format", "How to format diff output. Valid values are tabular, sql, json. Defaults to tabular.")
ap.SupportsString(WhereParam, "", "column", "filters columns based on values in the diff. See {{.EmphasisLeft}}dolt diff --help{{.EmphasisRight}} for details.")
ap.SupportsInt(LimitParam, "", "record_count", "limits to the first N diffs.")
ap.SupportsString("filter", "", "diff_type", "filters results based on the type of modification (added, modified, removed).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should follow the NameParam naming scheme

SQLDiffOutput diffOutput = 2
JsonDiffOutput diffOutput = 3

FilterAdds = "added"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the same post-tense wording, ie FilterAdded

Comment on lines 362 to 364
if filterValue != "" && filterValue != FilterAdds && filterValue != FilterModified && filterValue != FilterRemoved && filterValue != NoFilter {
return errhand.BuildDError("invalid filter: %s. Valid values are: added, modified, removed", filterValue).Build()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the IsValid func here so we only have to update one place.

return errhand.BuildDError("invalid output format: %s", f).Build()
}

filterValue, _ := apr.GetValue("filter")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the const that should be added to apr above here for these magic values.

displaySettings.limit, _ = apr.GetInt(cli.LimitParam)
displaySettings.where = apr.GetValueOrDefault(cli.WhereParam, "")

filterValue := apr.GetValueOrDefault("filter", "all")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this relates to magic apr vals

Comment on lines 965 to 995
// Apply table-level filtering based on diff type
if dArgs.filter != nil && dArgs.filter.filterBy != NoFilter {
shouldIncludeTable := false

// Check if table was added
if delta.IsAdd() && dArgs.filter.includeAddsOrAll() {
shouldIncludeTable = true
}

// Check if table was dropped
if delta.IsDrop() && dArgs.filter.includeDropsOrAll() {
shouldIncludeTable = true
}

// Check if table was modified (schema change or rename)
if !delta.IsAdd() && !delta.IsDrop() {
isModified := delta.SchemaChange || delta.IsRename()
if isModified && dArgs.filter.includeModificationsOrAll() {
shouldIncludeTable = true
}
// If no schema/rename changes but has data changes, let it through for row-level filtering
if !isModified && delta.DataChange {
shouldIncludeTable = true
}
}

if !shouldIncludeTable {
continue // Skip this table but continue processing others
}
}

Copy link
Contributor

@elianddb elianddb Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the TableDeltaSummary object provided. This may require rewriting your impl for the filter struct above, but it keeps the logic in one place. Specifically, there's a DiffType field you can make use of.

Secondly, I think this enables you to keep track of enabled filters in a map, ie str -> bool. This also keeps the diff type strings synchronized in the future. One func can handle the checks, instead of multiple include helper funcs. Of course, impl const vars to avoid the magic vals. You should define the consts in table_deltas.go since it already defines its own names for DiffType.

Comment on lines 215 to 218
fromTableName: fromTableName,
toTableName: toTableName,
isAdd: isAdd,
isDrop: isDrop,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information should be in the delta object already, aka TableDeltaSummary. Also, you could alternatively store the beginTable since that's your main priority.

type lazyRowWriter struct {
    realWriter  diff.SqlRowDiffWriter
    beginTable  func() error
    initialized bool
}

func (l *lazyRowWriter) WriteRow(ctx *sql.Context, row sql.Row, diffType diff.ChangeType, colDiffTypes []diff.ChangeType) error {
    if !l.initialized {
        if err := l.beginTable(); err != nil {
            return err
        }
        l.initialized = true
    }
    return l.realWriter.WriteRow(ctx, row, diffType, colDiffTypes)
}

Comment on lines 1451 to 1644
return errhand.VerboseErrorFromError(err)
var rowWriter diff.SqlRowDiffWriter
var err error
if shouldUseLazyHeader(dArgs, tableSummary) {
// Use lazy writer to delay BeginTable until first row write
rowWriter = newLazyRowWriter(
dw,
tableSummary.FromTableName.Name,
tableSummary.ToTableName.Name,
tableSummary.IsAdd(),
tableSummary.IsDrop(),
func() (diff.SqlRowDiffWriter, error) {
return dw.RowWriter(fromTableInfo, toTableInfo, tableSummary, unionSch)
},
)
} else {
// We always instantiate a RowWriter in case the diffWriter needs it to close off any work from schema output
rowWriter, err = dw.RowWriter(fromTableInfo, toTableInfo, tableSummary, unionSch)
if err != nil {
return errhand.VerboseErrorFromError(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of providing two different branchs to initialize rowWriter differently, wrap the original with your wrapper:

rowWriter, err := dw.RowWriter(fromTableInfo, toTableInfo, tableSummary, unionSch)
if err != nil {
    return errhand.VerboseErrorFromError(err)
}

if shouldUseLazyHeader(dArgs, tableSummary) {
    rowWriter = &lazyRowWriter{
        rowWriter,
        func() error {
            return dw.BeginTable(
                tableSummary.FromTableName.Name,
                tableSummary.ToTableName.Name,
                tableSummary.IsAdd(),
                tableSummary.IsDrop(),
            )
        },
    }
}


// Apply row-level filtering based on diff type
if dArgs.filter != nil && dArgs.filter.filterBy != NoFilter {
if shouldSkipDiffType(dArgs.filter, oldRow.RowDiff) || shouldSkipDiffType(dArgs.filter, newRow.RowDiff) {
Copy link
Contributor

@elianddb elianddb Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to using the existing summary DiffType field and adding consts for the strings. Make sure to update the helper funcs. You could check the map for the necessary fields and flip using !.

@codeaucafe
Copy link
Contributor Author

Thanks @elianddb. Sorry for the delay. I have not been able to take a look at your comments yet due to having to work very late last couple days and helping with some family stuff. I'll be sure to review your comments and ask you any questions I have when I get time to work on addressing your comments this weekend.

Thank you!

Cheers,
David

Constants and Naming:
- Add FilterParam constant to flags.go following existing conventions
- Move filter type constants (DiffTypeAdded, DiffTypeModified,
  DiffTypeRemoved, DiffTypeAll) to table_deltas.go for centralization
- Update all references throughout codebase to use diff.DiffType*
  constants for consistency

Validation Consolidation:
- Consolidate duplicate validation logic into single isValid() method
- Introduce newDiffTypeFilter() constructor for proper initialization
- Remove redundant validation checks in parseDiffArgs function

Map-Based Filter Architecture:
- Refactor diffTypeFilter struct to use map[string]bool instead of
  string field for more extensible filtering
- Replace three separate includeXOrAll() methods with single
  shouldInclude() method that performs map lookup
- Update both table-level and row-level filtering to use unified
  shouldInclude() approach

Row-Level Filtering with DiffType:
- Add ChangeTypeToDiffType() helper function in table_deltas.go to
  convert row-level ChangeType enum to table-level DiffType strings
- Refactor shouldSkipDiffType() to shouldSkipRow() using the new
  conversion helper and map-based filtering
- Ensure consistent terminology between table and row filtering by
  using same DiffType string values throughout

Refs: dolthub#1430
Refactor lazyRowWriter to use a cleaner callback-based
architecture that reduces struct complexity from 8 fields to 2
fields. This addresses PR review feedback requesting
simplification of the lazy header implementation.

Changes:
- Replace parameter storage with closure-based callback pattern
that captures BeginTable parameters from outer scope
- Eliminate separate ensureInitialized() method in favor of
inline initialization check in WriteRow() and
WriteCombinedRow()
- Remove initialized bool field by using callback presence (nil
check) to track initialization state
- Always create RowWriter upfront and only delay BeginTable
call, eliminating the need for writer factory function
- Simplify Close() method to always delegate to wrapped writer

Refs: dolthub#1430
Address @elianddb feedback on PR dolthub#10030 by standardizing DiffType
usage, simplifying filter logic, and fixing row-level filtering.

Changes:
- Fix row filtering bug: add early return for ChangeType.None to
prevent incorrectly skipping added/removed rows
- Standardize DiffType: replace string literals with constants
(DiffTypeAdded/Modified/Removed) in table_deltas.go and merge.go
- Simplify table filtering: use delta.DiffType directly
- Update tests: rewrite to use map-based architecture and
newDiffTypeFilter() constructor

Refs: dolthub#1430
Add DiffTypeRenamed constant and "removed" as user-friendly alias
for "dropped" filter option. This provides more granular filtering
and improves user experience with familiar terminology.

List of changes:
- Add DiffTypeRenamed constant for renamed tables
- Revert GetSummary to use DiffTypeRenamed instead of treating
renames as modified (table_deltas.go:742)
- Add "removed" as alias that maps to "dropped" internally for
user convenience
- Update filter validation to include renamed option
- Update CLI help text to document all filter options including
renamed and removed alias
- Handle renamed tables in merge stats alongside modified
- Update getDiffSummariesBetweenRefs and
getSchemaDiffSummariesBetweenRefs to handle renamed diff type
- Update all tests to use new constants and test renamed
filtering

Filter options now available: added, modified, renamed, dropped,
removed (alias for dropped).

Refs: dolthub#1430
Add tests for the --filter=renamed option and the --filter=removed alias
that maps to dropped.

Go tests:
- Tests for filter=renamed checking all diff types
- Tests for "removed" alias mapping to dropped internally
- Verify renamed filter only includes renamed tables

BATS tests:
- Test --filter=renamed with table rename scenario
- Test --filter=dropped with table drop scenario
- Verify --filter=removed alias works same as dropped
- Verify other filters correctly exclude renamed/dropped tables

Refs: dolthub#1430
Update the short help text for the --filter parameter to document
all valid filter options (added, modified, renamed, dropped) and
mention that 'removed' is accepted as an alias for 'dropped'.

Refs: dolthub#1430
Copy link
Contributor Author

@codeaucafe codeaucafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elianddb I updated the code based on your comments. Let me know if I misunderstood something and implemented it wrong.

I believe the failing bats test are unrelated and failing for same reasons those other unrelated bats tests failed for

Thank you!

continue
}
if summary.DiffType == "added" {
if summary.DiffType == diff.DiffTypeAdded {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, used the new diff type constants added here so we aren't used string literals here

// lazyRowWriter wraps a SqlRowDiffWriter and delays calling BeginTable
// until the first row is actually written. This prevents empty table headers
// when all rows are filtered out.
type lazyRowWriter struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this looking @elianddb?

Also, instead of having initialized just added nil check on onFirstWrite

To filter which data rows are displayed, use {{.EmphasisLeft}}--where <SQL expression>{{.EmphasisRight}}. Table column names in the filter expression must be prefixed with {{.EmphasisLeft}}from_{{.EmphasisRight}} or {{.EmphasisLeft}}to_{{.EmphasisRight}}, e.g. {{.EmphasisLeft}}to_COLUMN_NAME > 100{{.EmphasisRight}} or {{.EmphasisLeft}}from_COLUMN_NAME + to_COLUMN_NAME = 0{{.EmphasisRight}}.
To filter diff output by change type, use {{.EmphasisLeft}}--filter <type>{{.EmphasisRight}} where {{.EmphasisLeft}}<type>{{.EmphasisRight}} is one of {{.EmphasisLeft}}added{{.EmphasisRight}}, {{.EmphasisLeft}}modified{{.EmphasisRight}}, {{.EmphasisLeft}}renamed{{.EmphasisRight}}, or {{.EmphasisLeft}}dropped{{.EmphasisRight}}. The {{.EmphasisLeft}}added{{.EmphasisRight}} filter shows only additions (new tables or rows), {{.EmphasisLeft}}modified{{.EmphasisRight}} shows only schema modifications or row updates, {{.EmphasisLeft}}renamed{{.EmphasisRight}} shows only renamed tables, and {{.EmphasisLeft}}dropped{{.EmphasisRight}} shows only deletions (dropped tables or deleted rows). You can also use {{.EmphasisLeft}}removed{{.EmphasisRight}} as an alias for {{.EmphasisLeft}}dropped{{.EmphasisRight}}. For example, {{.EmphasisLeft}}dolt diff --filter=dropped{{.EmphasisRight}} shows only deleted rows and dropped tables.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to include renamed filter, as well as using removed for alias for dropped DiffTypeDropped


// Map "removed" to "dropped" (alias for user convenience)
internalFilterType := filterType
if filterType == "removed" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed filter value alias for DiffTypeDropped. Figure for filtering removed is better terminology but shouldn't be specific diff type value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dolt diff should support --filter option

3 participants