Skip to content

Scan Delete Support Part 4: Delete File Loading; Skeleton for Processing #982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sdd
Copy link
Contributor

@sdd sdd commented Feb 21, 2025

Extends the DeleteFileManager introduced in #950 To include loading of delete files, storage and retrieval of parsed delete files from shared state, and the outline for how parsing will connect up to this new work.

Issue: #630

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 5 times, most recently from edb1d27 to 8e90bdd Compare February 23, 2025 14:55
@sdd sdd marked this pull request as ready for review February 26, 2025 09:20
@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 4 times, most recently from ec8e7c1 to 06f0df5 Compare March 5, 2025 19:53
Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, will look at the parsed records next.

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 6 times, most recently from 5530bc3 to e997fc6 Compare March 31, 2025 17:27
@sdd
Copy link
Contributor Author

sdd commented Apr 3, 2025

@liurenjie1024, @Xuanwo, @Fokko - this is ready for re-review, if you could take a look that would be great!

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch from e997fc6 to 056e73f Compare April 3, 2025 07:28
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdd for this pr. There are some missing points in current design. Also I would suggest not putting too much in DeleteFilterManager. I suppose DeleterFilterManager acting more like a delete loader, which manages the io and caching of record batch. The actual filtering part, could delegate to DeleteFilter, WDYT? I think a good reference implementation is java's DeleteFilter, see https://github.com/apache/iceberg/blob/af8e3f5a40f4f36bbe1d868146749e2341471586/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L50

@sdd
Copy link
Contributor Author

sdd commented Apr 14, 2025

Thanks for the review @liurenjie1024 - much appreciated. Will come back with a revised design.

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 2 times, most recently from bd33aa5 to 39a26ab Compare April 17, 2025 06:39
@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 3 times, most recently from 5739a46 to 52cf8b9 Compare April 23, 2025 21:07
@liurenjie1024
Copy link
Contributor

Actually @liurenjie1024 I'll go ahead with the structural changes to split this into separate loader and filter structs and update this PR.

Sorry for late reply. I'm fine with deferring performance improvement, but have concerns with correctness problem.

@liurenjie1024
Copy link
Contributor

Hi, @sdd I saw you opened a series of pr for handling reading of deletions. I have a suggestion about it, instead of opening a series prs, you can have a large draft pr containing all your changes, while pick one component to open a small pr for review. This way reviewer could understand your whole design by walking through the large pr, and review carefully small pr. Also, when reviewer have comments, you only need to change one large pr instead of several small ones, WDYT?

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch from a227fa7 to e1edc95 Compare April 30, 2025 19:08
@sdd
Copy link
Contributor Author

sdd commented Apr 30, 2025

Thanks @sdd for this pr. There are some missing points in current design. Also I would suggest not putting too much in DeleteFilterManager. I suppose DeleterFilterManager acting more like a delete loader, which manages the io and caching of record batch. The actual filtering part, could delegate to DeleteFilter, WDYT? I think a good reference implementation is java's DeleteFilter, see https://github.com/apache/iceberg/blob/af8e3f5a40f4f36bbe1d868146749e2341471586/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L50

I've refactored as you suggested and you're right, it is a better design.

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch from e1edc95 to 324a872 Compare April 30, 2025 19:15
@sdd
Copy link
Contributor Author

sdd commented Apr 30, 2025

Hi, @sdd I saw you opened a series of pr for handling reading of deletions. I have a suggestion about it, instead of opening a series prs, you can have a large draft pr containing all your changes, while pick one component to open a small pr for review. This way reviewer could understand your whole design by walking through the large pr, and review carefully small pr. Also, when reviewer have comments, you only need to change one large pr instead of several small ones, WDYT?

I Could do, if you think it's worth it - the other two remaining PRs after this one are much smaller and it feels like just as much work to merge those into a single PR and then break PRs out of that again as it does to simply keep rebasing those two PRs on top of this one.

I'll be very glad once this delete file read support is done - it's been a long, hard slog to be honest and I'm struggling to stay motivated with it, but we're not far off now, hopefully.

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 3 times, most recently from 69b4da9 to 3804bda Compare May 1, 2025 06:20
@sdd sdd force-pushed the feat/delete-fila-manager-loading branch from af14046 to 5f0b073 Compare May 10, 2025 11:24
@sdd
Copy link
Contributor Author

sdd commented May 10, 2025

I had a bug in here that was causing the tests to deadlock in the follow-up PRs. I was missing a waker for my custom futures.That's been rectified now and this PR plus the two follow-ups are now ready for review once more.

@sdd
Copy link
Contributor Author

sdd commented May 14, 2025

@liurenjie1024, @Xuanwo - could do with a review again when you get chance! Thanks :-)

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdd , it looks better now, left some questions to refine.

Comment on lines 63 to 66
CachingDeleteFileLoader::parquet_to_batch_stream(&task.file_path, self.file_io.clone())
.await?;

Self::evolve_schema(raw_batch_stream, schema).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a reader api so that we could we don't need to repead this part everywhere, but this could be left to later.

/// Build the ArrowReader.
pub fn build(self) -> ArrowReader {
ArrowReader {
batch_size: self.batch_size,
file_io: self.file_io.clone(),
delete_file_manager: CachingDeleteFileManager::new(
delete_file_loader: CachingDeleteFileLoader::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not store a loader, instead we should store a DeleteFilter. Wha't should be called in ArrowReader should be things like following:

deleteFilter.filter(recordBatchStream)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not making best use of the features that ArrowRecordBatchReader provides us if we do that. We can't implement the delete filter as a simple filter on a RecordBatchStream unless we ditch using parquet's ParquetRecordBatchStream.

We're using parquet's ParquetRecordBatchStream to do the predicate filtering before we even get access to the RecordBatchStream. So by the time we have a stream of RecordBatches, we can't apply positional deletes or delete vectors because we no longer know what row number in the original file a record batches row corresponds to, as some rows can have been filtered out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be able to update ArrowReader to store a DeleteFileFilter rather than a DeleteFileLoader. But not to change the semantics of how the filter itself is used to match deleteFilter.filter(recordBatchStream). Not without rewriting the ArrowReader entirely not use ParquetRecordBatchStream and as a consequence needing to reimplement all the predicate filtering logic, row selection, projection, page skipping, and row group skipping that it gives us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got your point, and you are right maybe java's interface is not best for rust implementation. I think it's fine that we move forward first and refactor later, as long as we don't expose public apis.

@@ -207,7 +233,8 @@ impl ArrowReader {
record_batch_stream_builder = record_batch_stream_builder.with_batch_size(batch_size);
}

let delete_predicate = delete_file_manager.build_delete_predicate(task.schema.clone())?;
let delete_filter = delete_filter_rx.await.unwrap()?;
let delete_predicate = delete_filter.build_equality_delete_predicate(&task).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should hide these building of filter under DeleterFilter, rather than calling them directly in ArrowReader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you explain a bit more? I don't understand what you're asking for here. As per the previous comment, we're using ParquetRecordBatchStream to do most of the heavy lifting for us on a number of features. This means that the design of the DeleteFilter can't be as simple as just exposing a method that filters a recordbatch stream.


impl DeleteFilter {
/// Retrieve a delete vector for the data file associated with a given file scan task
pub fn get_delete_vector(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd to expose a public api to construct filters. For DeleteFilter, we should only expose one public api: pub fn filter(input: ArrowRecordBatchStream)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made all of DeleteFilter to be pub(crate) now then in that case. As per previous replies, pub fn filter(input: ArrowRecordBatchStream) does not make sense for usage with ArrowReader. If we also want to expose a public API for filtering that does implement that interface, we can add that in a follow-up PR.

@liurenjie1024
Copy link
Contributor

I Could do, if you think it's worth it - the other two remaining PRs after this one are much smaller and it feels like just as much work to merge those into a single PR and then break PRs out of that again as it does to simply keep rebasing those two PRs on top of this one.

It's just a minor suggestion, just do it with your favourite approach.

@sdd
Copy link
Contributor Author

sdd commented May 15, 2025

@liurenjie1024 back to you. I've addressed your suggestions that I think make sense to immediately change. I look forward to hearing back from you on the other points! :-)

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdd for this pr. Rust's reader is some how different from java, so we should not blindly follow java's api design. I think it's hard to define best api before we have a complete implementation, so I prefer to move forward first. cc @Xuanwo WDYT?

/// Build the ArrowReader.
pub fn build(self) -> ArrowReader {
ArrowReader {
batch_size: self.batch_size,
file_io: self.file_io.clone(),
delete_file_manager: CachingDeleteFileManager::new(
delete_file_loader: CachingDeleteFileLoader::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got your point, and you are right maybe java's interface is not best for rust implementation. I think it's fine that we move forward first and refactor later, as long as we don't expose public apis.

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch from b0eaa66 to 94d9307 Compare May 15, 2025 18:49
/// as per the other delete file types - only this time it is accompanied by a one-shot
/// channel sender that we will eventually use to resolve the shared future that we stored
/// in the state.
/// * When this gets updated to add support for delete vectors, the load phase will return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to the puffin / deletion vector support!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too! 😁

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdd for this pr!

@liurenjie1024
Copy link
Contributor

Let's wait for a moment to merge it after 0.5.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants