Skip to content

Commit 643f47a

Browse files
committed
feat: changes suggested in review
1 parent 9bacbee commit 643f47a

File tree

1 file changed

+0
-65
lines changed

1 file changed

+0
-65
lines changed

crates/iceberg/src/arrow/delete_file_manager.rs

Lines changed: 0 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -139,71 +139,6 @@ impl CachingDeleteFileManager {
139139
}
140140
}
141141

142-
/// Load the deletes for all the specified tasks
143-
///
144-
/// Returned future completes once all loading has finished.
145-
///
146-
/// * Create a single stream of all delete file tasks irrespective of type,
147-
/// so that we can respect the combined concurrency limit
148-
/// * We then process each in two phases: load and parse.
149-
/// * for positional deletes the load phase instantiates an ArrowRecordBatchStream to
150-
/// stream the file contents out
151-
/// * for eq deletes, we first check if the EQ delete is already loaded or being loaded by
152-
/// another concurrently processing data file scan task. If it is, we return a future
153-
/// for the pre-existing task from the load phase. If not, we create such a future
154-
/// and store it in the state to prevent other data file tasks from starting to load
155-
/// the same equality delete file, and return a record batch stream from the load phase
156-
/// as per the other delete file types - only this time it is accompanied by a one-shot
157-
/// channel sender that we will eventually use to resolve the shared future that we stored
158-
/// in the state.
159-
/// * When this gets updated to add support for delete vectors, the load phase will return
160-
/// a PuffinReader for them.
161-
/// * The parse phase parses each record batch stream according to its associated data type.
162-
/// The result of this is a map of data file paths to delete vectors for the positional
163-
/// delete tasks (and in future for the delete vector tasks). For equality delete
164-
/// file tasks, this results in an unbound Predicate.
165-
/// * The unbound Predicates resulting from equality deletes are sent to their associated oneshot
166-
/// channel to store them in the right place in the delete file managers state.
167-
/// * The results of all of these futures are awaited on in parallel with the specified
168-
/// level of concurrency and collected into a vec. We then combine all of the delete
169-
/// vector maps that resulted from any positional delete or delete vector files into a
170-
/// single map and persist it in the state.
171-
///
172-
///
173-
/// Conceptually, the data flow is like this:
174-
///
175-
/// FileScanTaskDeleteFile
176-
/// |
177-
/// Already-loading EQ Delete | Everything Else
178-
/// +---------------------------------------------------+
179-
/// | |
180-
/// [get existing future] [load recordbatch stream / puffin]
181-
/// DeleteFileContext::InProgEqDel DeleteFileContext
182-
/// | |
183-
/// | |
184-
/// | +-----------------------------+--------------------------+
185-
/// | Pos Del Del Vec (Not yet Implemented) EQ Del
186-
/// | | | |
187-
/// | [parse pos del stream] [parse del vec puffin] [parse eq del]
188-
/// | HashMap<String, RoaringTreeMap> HashMap<String, RoaringTreeMap> (Predicate, Sender)
189-
/// | | | |
190-
/// | | | [persist to state]
191-
/// | | | ()
192-
/// | | | |
193-
/// | +-----------------------------+--------------------------+
194-
/// | |
195-
/// | [buffer unordered]
196-
/// | |
197-
/// | [combine del vectors]
198-
/// | HashMap<String, RoaringTreeMap>
199-
/// | |
200-
/// | [persist del vectors to state]
201-
/// | ()
202-
/// | |
203-
/// +-------------------------+-------------------------+
204-
/// |
205-
/// [join!]
206-
/// */
207142
pub(crate) async fn load_deletes(
208143
&self,
209144
delete_file_entries: &[FileScanTaskDeleteFile],

0 commit comments

Comments
 (0)