-
Notifications
You must be signed in to change notification settings - Fork 7
Rewrite the "sync_local" query #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR rewrites the "sync_local" query to optimize filesystem operations by reducing temporary storage usage and adds support for bucket priorities. Key changes include:
- Refactoring the Virtual FileSystem tracking and test utilities to use unique VFS names for concurrency safety.
- Updating SQL queries in the Rust implementation to replace bucket count checks with NULL checks and simplify query logic.
- Upgrading the sqlite3 dependency version.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
dart/test/utils/tracking_vfs.dart | Implements a tracking VFS to count filesystem reads and writes. |
dart/test/utils/native_test_utils.dart | Updates database open function to support unique file names. |
dart/test/sync_test.dart | Adjusts test setup to avoid concurrency issues with unique VFS names. |
dart/test/perf_test.dart | Adds performance tests for tracking filesystem operations. |
dart/test/js_key_encoding_test.dart | Updates test configurations to use unique file system names. |
dart/pubspec.yaml | Upgrades the sqlite3 dependency to a newer version. |
crates/core/src/sync_local.rs | Rewrites the sync_local query and optimizes its SQL logic. |
Comments suppressed due to low confidence (1)
crates/core/src/sync_local.rs:197
- Confirm that the switch from DISTINCT to UNION ALL in the 'updated_rows' CTE is intentional and that the subsequent GROUP BY clause correctly handles any potential duplicate rows.
UNION ALL SELECT row_type, row_id FROM ps_updated_rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of using a VFS for instrumentation 👍 The query change also looks good to me from a quick look (apart from one question).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
Supersedes #56. This is the same basic optimization as in that PR, but now also supports bucket priorities, and updated with the latest changes.
This now tracks filesystem operations (pages written/read) as the main metric to optimize for this query. It is not the only metric that matters, but it is consistent, and a good indication of real-world performance with large amounts of data on mobile and web platforms.
For the base test case, 500k rows:
There is some increase in reads on the data db, but a massive reduction in temporary storage operations.
The above is with the default SQLite cache_size. If we set say
PRAGMA cache_size=-50000
(50MB), we can do 500k rows without using any temporary storage with the changes here. With the previous query, temporary storage was already used after 30-40k rows.There are further optimizations to be made for the initial sync or when we're re-syncing most of the data, but the gains here already significant.