Add read cache for zip archive entries in XLSX reader#4825
Draft
kemo wants to merge 5 commits into
Draft
Conversation
Contributor
Author
Benchmark Results (xlsx-zip-read-cache)Targeted benchmark: read XLSX files with varying zip archive complexity (single and multi-sheet).
No meaningful difference. The zip read cache doesn't help when each archive entry is read only once during a load. The benefit would show in scenarios where the same zip entries are accessed multiple times (e.g. repeated partial reads or multi-pass processing). |
Collaborator
|
Thank you for the summaries. For purposes of prioritization, I am going to put the PRs with less demonstrable improvement, including this one, in draft status for now. This does not mean that I will not return to them. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
getFromZipArchive()in the XLSX reader to avoid re-reading the same zip entriesgetFromName()attempts each — files likestyles.xmlandtheme.xmlmay be decompressed multiple timescanRead,listWorksheetNames,listWorksheetInfo,loadSpreadsheetFromFile)Changes
src/PhpSpreadsheet/Reader/Xlsx.php: Added$zipCachearray, cache check ingetFromZipArchive(),clearZipCache()calls at entry pointstests/PhpSpreadsheetTests/Reader/Xlsx/ZipReadCacheTest.php: 5 tests covering correctness, repeated loads, different files, list operationstests/PhpSpreadsheetTests/Benchmark/XlsxZipReadCacheBenchmark.php: Benchmark with complex multi-sheet XLSXTest plan
listWorksheetNames()andlistWorksheetInfo()work correctlyvendor/bin/phpunit --group benchmark --filter XlsxZipReadCacheBenchmark --stderr