Skip to content

Commit 0cd930c

Browse files
committed
add log msg
1 parent 7c8d84a commit 0cd930c

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

src/datatrove/pipeline/dedup/minhash.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -584,6 +584,7 @@ def union(v_a, v_b):
584584
set_size.pop(root_b, None) # clear up space
585585

586586
with self.track_time():
587+
logger.info("Loading dup files...")
587588
for dup_file in tqdm(dup_files, desc="Reading dup files"):
588589
with self.input_folder.open(dup_file, "rb") as dupf:
589590
for f1, d1, f2, d2 in read_tuples_from_file(dupf, "4I", lines_to_buffer=self.lines_to_buffer):

0 commit comments

Comments
 (0)