Skip to content

Rewrite the "sync_local" query #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 28, 2025
Merged

Rewrite the "sync_local" query #78

merged 9 commits into from
May 28, 2025

Conversation

rkistner
Copy link
Contributor

Supersedes #56. This is the same basic optimization as in that PR, but now also supports bucket priorities, and updated with the latest changes.

This now tracks filesystem operations (pages written/read) as the main metric to optimize for this query. It is not the only metric that matters, but it is consistent, and a good indication of real-world performance with large amounts of data on mobile and web platforms.

For the base test case, 500k rows:

# Data setup stats (for reference only)
init stats: Reads: 1348044 + 0 | Writes: 1188543 + 0 # (main db page operations) + (temp storage page operations)
# Before
4283ms Reads: 918018 + 531168 | Writes: 68904 + 521079
# After:
3800ms Reads: 1289503 + 5518 | Writes: 69023 + 5518

There is some increase in reads on the data db, but a massive reduction in temporary storage operations.
The above is with the default SQLite cache_size. If we set say PRAGMA cache_size=-50000 (50MB), we can do 500k rows without using any temporary storage with the changes here. With the previous query, temporary storage was already used after 30-40k rows.

There are further optimizations to be made for the initial sync or when we're re-syncing most of the data, but the gains here already significant.

@rkistner rkistner requested a review from Copilot May 26, 2025 12:25
@rkistner rkistner mentioned this pull request May 26, 2025
2 tasks
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR rewrites the "sync_local" query to optimize filesystem operations by reducing temporary storage usage and adds support for bucket priorities. Key changes include:

  • Refactoring the Virtual FileSystem tracking and test utilities to use unique VFS names for concurrency safety.
  • Updating SQL queries in the Rust implementation to replace bucket count checks with NULL checks and simplify query logic.
  • Upgrading the sqlite3 dependency version.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
dart/test/utils/tracking_vfs.dart Implements a tracking VFS to count filesystem reads and writes.
dart/test/utils/native_test_utils.dart Updates database open function to support unique file names.
dart/test/sync_test.dart Adjusts test setup to avoid concurrency issues with unique VFS names.
dart/test/perf_test.dart Adds performance tests for tracking filesystem operations.
dart/test/js_key_encoding_test.dart Updates test configurations to use unique file system names.
dart/pubspec.yaml Upgrades the sqlite3 dependency to a newer version.
crates/core/src/sync_local.rs Rewrites the sync_local query and optimizes its SQL logic.
Comments suppressed due to low confidence (1)

crates/core/src/sync_local.rs:197

  • Confirm that the switch from DISTINCT to UNION ALL in the 'updated_rows' CTE is intentional and that the subsequent GROUP BY clause correctly handles any potential duplicate rows.
UNION ALL SELECT row_type, row_id FROM ps_updated_rows

Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of using a VFS for instrumentation 👍 The query change also looks good to me from a quick look (apart from one question).

Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@rkistner rkistner merged commit fee756b into main May 28, 2025
21 checks passed
@rkistner rkistner deleted the optimize-sync-local branch May 28, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants