Fix 200% untar progress issue by avoiding a second read of file #17708

perseoGI · 2025-02-05T12:01:52Z

Changelog: Fix: Improve untar performance
Docs: omit

Calling TarFile.getmembers() forces a complete IO read of the tar file.
The same happens when calling TarFile.extractall(members=members) for the first time.

This PR aims to fix the reported issue of having untar progress exceeding 100% occasionated by calling seek(0) twice, one in the first read querying all files and a second for the actual uncompression.

Details:

Perform the decompression in a single IO loop by iterating over compressed files without actually reading them, filtering and uncompressing when needed one at a time.

Benchmarks:

develop2 untar code

$ time conan source recipes/boost/all --version 1.86.0

conanfile.py (boost/1.86.0): Calling source() in /Users/perseo/sources/conan-center-index/recipes/boost/all/src
conanfile.py (boost/1.86.0): Source ['https://archives.boost.io/release/1.86.0/source/boost_1_86_0.tar.bz2', 'https://sourceforge.net/projects/boost/files/boost/1.86.0/boost_1_86_0.tar.bz2'] retrieved from local download cache
conanfile.py (boost/1.86.0): Unzipping boost_1_86_0.tar.bz2 to /Users/perseo/sources/conan-center-index/recipes/boost/all/src
Uncompressing boost_1_86_0.tar.bz2: 5%
Uncompressing boost_1_86_0.tar.bz2: 10%
Uncompressing boost_1_86_0.tar.bz2: 22%
Uncompressing boost_1_86_0.tar.bz2: 30%
Uncompressing boost_1_86_0.tar.bz2: 42%
Uncompressing boost_1_86_0.tar.bz2: 54%
Uncompressing boost_1_86_0.tar.bz2: 62%
Uncompressing boost_1_86_0.tar.bz2: 71%
Uncompressing boost_1_86_0.tar.bz2: 79%
Uncompressing boost_1_86_0.tar.bz2: 89%
Uncompressing boost_1_86_0.tar.bz2: 95%
Uncompressing boost_1_86_0.tar.bz2: 100%
Uncompressing boost_1_86_0.tar.bz2: 103%
Uncompressing boost_1_86_0.tar.bz2: 105%
Uncompressing boost_1_86_0.tar.bz2: 106%
Uncompressing boost_1_86_0.tar.bz2: 108%
Uncompressing boost_1_86_0.tar.bz2: 112%
Uncompressing boost_1_86_0.tar.bz2: 121%
Uncompressing boost_1_86_0.tar.bz2: 124%
Uncompressing boost_1_86_0.tar.bz2: 127%
Uncompressing boost_1_86_0.tar.bz2: 130%
Uncompressing boost_1_86_0.tar.bz2: 139%
Uncompressing boost_1_86_0.tar.bz2: 146%
Uncompressing boost_1_86_0.tar.bz2: 152%
Uncompressing boost_1_86_0.tar.bz2: 156%
Uncompressing boost_1_86_0.tar.bz2: 161%
Uncompressing boost_1_86_0.tar.bz2: 165%
Uncompressing boost_1_86_0.tar.bz2: 170%
Uncompressing boost_1_86_0.tar.bz2: 172%
Uncompressing boost_1_86_0.tar.bz2: 176%
Uncompressing boost_1_86_0.tar.bz2: 179%
Uncompressing boost_1_86_0.tar.bz2: 184%
Uncompressing boost_1_86_0.tar.bz2: 189%
Uncompressing boost_1_86_0.tar.bz2: 193%
Uncompressing boost_1_86_0.tar.bz2: 195%
Uncompressing boost_1_86_0.tar.bz2: 196%
Uncompressing boost_1_86_0.tar.bz2: 199%
conanfile.py (boost/1.86.0): Apply patch (conan): Optional flag to specify iconv from either libc of libiconv
conan source recipes/boost/all    25,42s user 10,91s system 94% cpu 38,446 total

New changes

$  time conan source recipes/boost/all --version 1.86.0

conanfile.py (boost/1.86.0): Calling source() in /Users/perseo/sources/conan-center-index/recipes/boost/all/src
conanfile.py (boost/1.86.0): Source ['https://archives.boost.io/release/1.86.0/source/boost_1_86_0.tar.bz2', 'https://sourceforge.net/projects/boost/files/boost/1.86.0/boost_1_86_0.tar.bz2'] retrieved from local download cache
conanfile.py (boost/1.86.0): Unzipping boost_1_86_0.tar.bz2 to /Users/perseo/sources/conan-center-index/recipes/boost/all/src
Uncompressing boost_1_86_0.tar.bz2: 1%
Uncompressing boost_1_86_0.tar.bz2: 4%
Uncompressing boost_1_86_0.tar.bz2: 5%
Uncompressing boost_1_86_0.tar.bz2: 7%
Uncompressing boost_1_86_0.tar.bz2: 9%
Uncompressing boost_1_86_0.tar.bz2: 11%
Uncompressing boost_1_86_0.tar.bz2: 18%
Uncompressing boost_1_86_0.tar.bz2: 21%
Uncompressing boost_1_86_0.tar.bz2: 24%
Uncompressing boost_1_86_0.tar.bz2: 28%
Uncompressing boost_1_86_0.tar.bz2: 31%
Uncompressing boost_1_86_0.tar.bz2: 41%
Uncompressing boost_1_86_0.tar.bz2: 47%
Uncompressing boost_1_86_0.tar.bz2: 52%
Uncompressing boost_1_86_0.tar.bz2: 56%
Uncompressing boost_1_86_0.tar.bz2: 61%
Uncompressing boost_1_86_0.tar.bz2: 64%
Uncompressing boost_1_86_0.tar.bz2: 69%
Uncompressing boost_1_86_0.tar.bz2: 72%
Uncompressing boost_1_86_0.tar.bz2: 74%
Uncompressing boost_1_86_0.tar.bz2: 77%
Uncompressing boost_1_86_0.tar.bz2: 82%
Uncompressing boost_1_86_0.tar.bz2: 87%
Uncompressing boost_1_86_0.tar.bz2: 92%
Uncompressing boost_1_86_0.tar.bz2: 94%
Uncompressing boost_1_86_0.tar.bz2: 95%
Uncompressing boost_1_86_0.tar.bz2: 96%
conanfile.py (boost/1.86.0): Apply patch (conan): Optional flag to specify iconv from either libc of libiconv
conan source recipes/boost/all    15,30s user 10,32s system 89% cpu 28,726 total

The results show a 25% impact on uncompression efficiency

Closes #17697

Refer to the issue that supports this Pull Request.
If the issue has missing info, explain the purpose/use case/pain/need that covers this Pull Request.
I've read the Contributing guide.
I've followed the PEP8 style guides for Python code.
I've opened another PR in the Conan docs repo to the develop branch, documenting this one.

AbrilRBS

Looks great! Seems like performance-wise, both methods will end up calling extract_one in the tarfile for each member, so there would be no extra overhead there, we'll just need to address my comment

conan/tools/files/files.py

memsharded

~~Please share real timings of unzipping some large files, with and without this change.~~

UPDATE: Sorry I didn't see the OP clearly, now it is clear, thanks!

Then @AbrilRBS your comment

Looks great! Seems like performance-wise, both methods will end up calling extract_one in the tarfile for each member, so there would be no extra overhead there, we'll just need to address my comment

Not fully clear, what do you mean?

memsharded · 2025-02-05T12:35:35Z

conan/tools/files/files.py

-            if pattern:
-                members = list(filter(lambda m: fnmatch(m.name, pattern),
-                                      tarredgzippedFile.getmembers()))


Doesn't this case read a third time the file, potentially causing 300%?
If the getmembers() is doing a file read IO, this seems the case?

Nop, getmembers have a static variable / cache mechanism.
The maximum reads that tarfile could perform on a targz are two!

See code:

def getmembers(self): """Return the members of the archive as a list of TarInfo objects. The list has the same order as the members in the archive. """ self._check() if not self._loaded: # if we want to obtain a list of self._load() # all members, we first have to # scan the whole archive. return self.members

uilianries · 2025-02-05T14:43:10Z

I did not check the code, but if I understand correctly, the uncompress status is updated every second. I just tried this branch and indeed it works until 99%, but Qt is huge, then, each step is repeated, but it's fine IMO:

qt/6.7.3: Calling source() in /home/uilian/.conan2/p/qte6ce713a78304/s/src
qt/6.7.3: Source ['https://download.qt.io/official_releases/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://download.qt.io/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.ukfast.co.uk/sites/qt.io/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.20i.com/pub/qt.io/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.nluug.nl/languages/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirror.netcologne.de/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://qt-mirror.dannhauer.de/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.fau.de/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.dotsrc.org/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.icm.edu.pl/packages/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.acc.umu.se/mirror/qt.io/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://www.nic.funet.fi/pub/mirrors/download.qt-project.org/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://qt.mirror.constant.com/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.sau.edu.cn/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.cloud.tencent.com/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirror.bjtu.edu.cn/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.sjtug.sjtu.edu.cn/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.jaist.ac.jp/pub/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.yz.yamagata-u.ac.jp/pub/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz'] retrieved from local download cache
qt/6.7.3: Unzipping qt-everywhere-src-6.7.3.tar.xz to /home/uilian/.conan2/p/qte6ce713a78304/s/src
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 6%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 12%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 16%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 17%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 18%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 19%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 20%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 20%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 21%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 22%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 23%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 26%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 26%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 26%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 26%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 27%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 28%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 29%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 30%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 31%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 33%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 35%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 37%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 38%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 39%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 40%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 41%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 41%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 41%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 42%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 43%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 44%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 44%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 46%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 47%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 47%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 48%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 50%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 50%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 51%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 51%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 52%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 52%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 53%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 53%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 54%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 55%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 55%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 56%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 57%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 58%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 59%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 60%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 60%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 61%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 63%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 65%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 66%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 67%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 69%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 70%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 72%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 72%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 72%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 72%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 73%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 73%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 73%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 74%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 75%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 75%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 76%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 78%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 78%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 79%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 79%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 80%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 81%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 82%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 82%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 83%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 84%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 85%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 85%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 86%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 86%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 86%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 87%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 87%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 87%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 87%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 89%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 90%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 91%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 92%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 92%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 93%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 94%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 95%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 96%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 96%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 96%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 97%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 97%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 98%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 98%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 98%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
qt/6.7.3: Apply patch (bugfix): Workaround for too long .rps file name

-------- Installing package qt/6.7.3 (36 of 36) --------
qt/6.7.3: Building from source

perseoGI · 2025-02-05T16:00:42Z

@uilianries

indeed it works until 99%

Yes... I think I can add a final log which explicitly marks the 100% just for the TOC

but Qt is huge, then, each step is repeated, but it's fine IMO:

That is a very good point, the progress printer is managed by a TimedOutput which has a print window based on the interval. We could opt for printing the progress only when the percentage has change since the last print but it may require some changes. Also, as we do not have a progress bar, printing the percentage regularly may be a good feedback for users to keep telling them that conan is doing some job and it has not gotten stuck.

uilianries

@perseoGI thank you for clarifying the feature. It looks to me now.

Fix 200% untar progress issue by avoiding a second read of file

1563975

AbrilRBS added this to the 2.13.0 milestone Feb 5, 2025

perseoGI assigned AbrilRBS Feb 5, 2025

perseoGI requested a review from uilianries February 5, 2025 12:11

memsharded self-assigned this Feb 5, 2025

AbrilRBS reviewed Feb 5, 2025

View reviewed changes

conan/tools/files/files.py Show resolved Hide resolved

memsharded reviewed Feb 5, 2025

View reviewed changes

Fix test cases and updated exception message

d31bd24

AbrilRBS approved these changes Feb 5, 2025

View reviewed changes

uilianries approved these changes Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 200% untar progress issue by avoiding a second read of file #17708

Fix 200% untar progress issue by avoiding a second read of file #17708

perseoGI commented Feb 5, 2025

AbrilRBS left a comment

memsharded left a comment •

edited

Loading

memsharded Feb 5, 2025

perseoGI Feb 5, 2025

uilianries commented Feb 5, 2025

perseoGI commented Feb 5, 2025

uilianries left a comment

Fix 200% untar progress issue by avoiding a second read of file #17708

Are you sure you want to change the base?

Fix 200% untar progress issue by avoiding a second read of file #17708

Conversation

perseoGI commented Feb 5, 2025

Details:

Benchmarks:

develop2 untar code

New changes

AbrilRBS left a comment

Choose a reason for hiding this comment

memsharded left a comment • edited Loading

Choose a reason for hiding this comment

memsharded Feb 5, 2025

Choose a reason for hiding this comment

perseoGI Feb 5, 2025

Choose a reason for hiding this comment

uilianries commented Feb 5, 2025

perseoGI commented Feb 5, 2025

uilianries left a comment

Choose a reason for hiding this comment

memsharded left a comment •

edited

Loading