Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 200% untar progress issue by avoiding a second read of file #17708

Open
wants to merge 2 commits into
base: develop2
Choose a base branch
from

Conversation

perseoGI
Copy link
Contributor

@perseoGI perseoGI commented Feb 5, 2025

Changelog: Fix: Improve untar performance
Docs: omit

Calling TarFile.getmembers() forces a complete IO read of the tar file.
The same happens when calling TarFile.extractall(members=members) for the first time.

This PR aims to fix the reported issue of having untar progress exceeding 100% occasionated by calling seek(0) twice, one in the first read querying all files and a second for the actual uncompression.

Details:

Perform the decompression in a single IO loop by iterating over compressed files without actually reading them, filtering and uncompressing when needed one at a time.

Benchmarks:

develop2 untar code

$ time conan source recipes/boost/all --version 1.86.0                                                                   
conanfile.py (boost/1.86.0): Calling source() in /Users/perseo/sources/conan-center-index/recipes/boost/all/src
conanfile.py (boost/1.86.0): Source ['https://archives.boost.io/release/1.86.0/source/boost_1_86_0.tar.bz2', 'https://sourceforge.net/projects/boost/files/boost/1.86.0/boost_1_86_0.tar.bz2'] retrieved from local download cache
conanfile.py (boost/1.86.0): Unzipping boost_1_86_0.tar.bz2 to /Users/perseo/sources/conan-center-index/recipes/boost/all/src
Uncompressing boost_1_86_0.tar.bz2: 5%
Uncompressing boost_1_86_0.tar.bz2: 10%
Uncompressing boost_1_86_0.tar.bz2: 22%
Uncompressing boost_1_86_0.tar.bz2: 30%
Uncompressing boost_1_86_0.tar.bz2: 42%
Uncompressing boost_1_86_0.tar.bz2: 54%
Uncompressing boost_1_86_0.tar.bz2: 62%
Uncompressing boost_1_86_0.tar.bz2: 71%
Uncompressing boost_1_86_0.tar.bz2: 79%
Uncompressing boost_1_86_0.tar.bz2: 89%
Uncompressing boost_1_86_0.tar.bz2: 95%
Uncompressing boost_1_86_0.tar.bz2: 100%
Uncompressing boost_1_86_0.tar.bz2: 103%
Uncompressing boost_1_86_0.tar.bz2: 105%
Uncompressing boost_1_86_0.tar.bz2: 106%
Uncompressing boost_1_86_0.tar.bz2: 108%
Uncompressing boost_1_86_0.tar.bz2: 112%
Uncompressing boost_1_86_0.tar.bz2: 121%
Uncompressing boost_1_86_0.tar.bz2: 124%
Uncompressing boost_1_86_0.tar.bz2: 127%
Uncompressing boost_1_86_0.tar.bz2: 130%
Uncompressing boost_1_86_0.tar.bz2: 139%
Uncompressing boost_1_86_0.tar.bz2: 146%
Uncompressing boost_1_86_0.tar.bz2: 152%
Uncompressing boost_1_86_0.tar.bz2: 156%
Uncompressing boost_1_86_0.tar.bz2: 161%
Uncompressing boost_1_86_0.tar.bz2: 165%
Uncompressing boost_1_86_0.tar.bz2: 170%
Uncompressing boost_1_86_0.tar.bz2: 172%
Uncompressing boost_1_86_0.tar.bz2: 176%
Uncompressing boost_1_86_0.tar.bz2: 179%
Uncompressing boost_1_86_0.tar.bz2: 184%
Uncompressing boost_1_86_0.tar.bz2: 189%
Uncompressing boost_1_86_0.tar.bz2: 193%
Uncompressing boost_1_86_0.tar.bz2: 195%
Uncompressing boost_1_86_0.tar.bz2: 196%
Uncompressing boost_1_86_0.tar.bz2: 199%
conanfile.py (boost/1.86.0): Apply patch (conan): Optional flag to specify iconv from either libc of libiconv
conan source recipes/boost/all    25,42s user 10,91s system 94% cpu 38,446 total

New changes

$  time conan source recipes/boost/all --version 1.86.0
conanfile.py (boost/1.86.0): Calling source() in /Users/perseo/sources/conan-center-index/recipes/boost/all/src
conanfile.py (boost/1.86.0): Source ['https://archives.boost.io/release/1.86.0/source/boost_1_86_0.tar.bz2', 'https://sourceforge.net/projects/boost/files/boost/1.86.0/boost_1_86_0.tar.bz2'] retrieved from local download cache
conanfile.py (boost/1.86.0): Unzipping boost_1_86_0.tar.bz2 to /Users/perseo/sources/conan-center-index/recipes/boost/all/src
Uncompressing boost_1_86_0.tar.bz2: 1%
Uncompressing boost_1_86_0.tar.bz2: 4%
Uncompressing boost_1_86_0.tar.bz2: 5%
Uncompressing boost_1_86_0.tar.bz2: 7%
Uncompressing boost_1_86_0.tar.bz2: 9%
Uncompressing boost_1_86_0.tar.bz2: 11%
Uncompressing boost_1_86_0.tar.bz2: 18%
Uncompressing boost_1_86_0.tar.bz2: 21%
Uncompressing boost_1_86_0.tar.bz2: 24%
Uncompressing boost_1_86_0.tar.bz2: 28%
Uncompressing boost_1_86_0.tar.bz2: 31%
Uncompressing boost_1_86_0.tar.bz2: 41%
Uncompressing boost_1_86_0.tar.bz2: 47%
Uncompressing boost_1_86_0.tar.bz2: 52%
Uncompressing boost_1_86_0.tar.bz2: 56%
Uncompressing boost_1_86_0.tar.bz2: 61%
Uncompressing boost_1_86_0.tar.bz2: 64%
Uncompressing boost_1_86_0.tar.bz2: 69%
Uncompressing boost_1_86_0.tar.bz2: 72%
Uncompressing boost_1_86_0.tar.bz2: 74%
Uncompressing boost_1_86_0.tar.bz2: 77%
Uncompressing boost_1_86_0.tar.bz2: 82%
Uncompressing boost_1_86_0.tar.bz2: 87%
Uncompressing boost_1_86_0.tar.bz2: 92%
Uncompressing boost_1_86_0.tar.bz2: 94%
Uncompressing boost_1_86_0.tar.bz2: 95%
Uncompressing boost_1_86_0.tar.bz2: 96%
conanfile.py (boost/1.86.0): Apply patch (conan): Optional flag to specify iconv from either libc of libiconv
conan source recipes/boost/all    15,30s user 10,32s system 89% cpu 28,726 total

The results show a 25% impact on uncompression efficiency

Closes #17697


  • Refer to the issue that supports this Pull Request.
  • If the issue has missing info, explain the purpose/use case/pain/need that covers this Pull Request.
  • I've read the Contributing guide.
  • I've followed the PEP8 style guides for Python code.
  • I've opened another PR in the Conan docs repo to the develop branch, documenting this one.

@AbrilRBS AbrilRBS added this to the 2.13.0 milestone Feb 5, 2025
@perseoGI perseoGI requested a review from uilianries February 5, 2025 12:11
@memsharded memsharded self-assigned this Feb 5, 2025
Copy link
Member

@AbrilRBS AbrilRBS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Seems like performance-wise, both methods will end up calling extract_one in the tarfile for each member, so there would be no extra overhead there, we'll just need to address my comment

conan/tools/files/files.py Show resolved Hide resolved
Copy link
Member

@memsharded memsharded left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share real timings of unzipping some large files, with and without this change.

UPDATE: Sorry I didn't see the OP clearly, now it is clear, thanks!

Then @AbrilRBS your comment

Looks great! Seems like performance-wise, both methods will end up calling extract_one in the tarfile for each member, so there would be no extra overhead there, we'll just need to address my comment

Not fully clear, what do you mean?

Comment on lines -367 to -369
if pattern:
members = list(filter(lambda m: fnmatch(m.name, pattern),
tarredgzippedFile.getmembers()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this case read a third time the file, potentially causing 300%?
If the getmembers() is doing a file read IO, this seems the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nop, getmembers have a static variable / cache mechanism.
The maximum reads that tarfile could perform on a targz are two!

See code:

    def getmembers(self):
        """Return the members of the archive as a list of TarInfo objects. The
           list has the same order as the members in the archive.
        """
        self._check()
        if not self._loaded:    # if we want to obtain a list of
            self._load()        # all members, we first have to
                                # scan the whole archive.
        return self.members

@uilianries
Copy link
Member

I did not check the code, but if I understand correctly, the uncompress status is updated every second. I just tried this branch and indeed it works until 99%, but Qt is huge, then, each step is repeated, but it's fine IMO:

qt/6.7.3: Calling source() in /home/uilian/.conan2/p/qte6ce713a78304/s/src
qt/6.7.3: Source ['https://download.qt.io/official_releases/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://download.qt.io/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.ukfast.co.uk/sites/qt.io/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.20i.com/pub/qt.io/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.nluug.nl/languages/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirror.netcologne.de/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://qt-mirror.dannhauer.de/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.fau.de/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.dotsrc.org/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.icm.edu.pl/packages/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.acc.umu.se/mirror/qt.io/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://www.nic.funet.fi/pub/mirrors/download.qt-project.org/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://qt.mirror.constant.com/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.sau.edu.cn/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.cloud.tencent.com/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirror.bjtu.edu.cn/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://mirrors.sjtug.sjtu.edu.cn/qt/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.jaist.ac.jp/pub/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz', 'https://ftp.yz.yamagata-u.ac.jp/pub/qtproject/archive/qt/6.7/6.7.3/single/qt-everywhere-src-6.7.3.tar.xz'] retrieved from local download cache
qt/6.7.3: Unzipping qt-everywhere-src-6.7.3.tar.xz to /home/uilian/.conan2/p/qte6ce713a78304/s/src
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 6%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 12%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 16%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 17%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 18%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 19%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 20%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 20%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 21%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 22%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 23%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 25%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 26%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 26%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 26%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 26%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 27%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 28%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 29%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 30%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 31%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 33%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 35%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 37%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 38%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 39%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 40%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 41%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 41%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 41%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 42%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 43%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 44%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 44%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 45%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 46%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 47%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 47%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 48%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 50%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 50%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 51%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 51%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 52%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 52%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 53%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 53%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 54%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 55%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 55%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 56%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 57%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 58%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 59%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 60%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 60%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 61%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 63%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 65%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 66%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 67%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 69%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 70%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 72%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 72%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 72%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 72%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 73%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 73%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 73%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 74%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 75%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 75%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 76%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 78%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 78%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 79%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 79%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 80%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 81%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 82%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 82%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 83%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 84%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 85%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 85%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 86%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 86%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 86%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 87%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 87%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 87%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 87%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 89%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 90%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 91%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 92%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 92%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 93%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 94%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 95%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 96%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 96%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 96%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 97%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 97%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 98%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 98%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 98%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
Uncompressing qt-everywhere-src-6.7.3.tar.xz: 99%
qt/6.7.3: Apply patch (bugfix): Workaround for too long .rps file name

-------- Installing package qt/6.7.3 (36 of 36) --------
qt/6.7.3: Building from source

@perseoGI
Copy link
Contributor Author

perseoGI commented Feb 5, 2025

@uilianries

indeed it works until 99%

Yes... I think I can add a final log which explicitly marks the 100% just for the TOC

but Qt is huge, then, each step is repeated, but it's fine IMO:

That is a very good point, the progress printer is managed by a TimedOutput which has a print window based on the interval. We could opt for printing the progress only when the percentage has change since the last print but it may require some changes. Also, as we do not have a progress bar, printing the percentage regularly may be a good feedback for users to keep telling them that conan is doing some job and it has not gotten stuck.

Copy link
Member

@uilianries uilianries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perseoGI thank you for clarifying the feature. It looks to me now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Uncompressing source package reaches 200%
4 participants