Skip to content

Commit c296099

Browse files
read_bytes(): use previous implementation again for small reads
For small reads, the new code that tries to avoid unnecessary reads is noticeably slower than the previous code that reads unconditionally. In the worst case (1-byte reads), the new code is 13 times as slow as the previous implementation. The potential memory/IO savings only become worth it for larger reads, where the performance difference disappears. Co-authored-by: Petr Pucil <[email protected]>
1 parent 255f5b7 commit c296099

File tree

1 file changed

+12
-3
lines changed

1 file changed

+12
-3
lines changed

kaitaistruct.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -301,9 +301,18 @@ def read_bytes(self, n):
301301
)
302302

303303
is_satisfiable = True
304-
# in Python 2, there is a common error ['file' object has no
305-
# attribute 'seekable'], so we need to make sure that seekable() exists
306-
if callable(getattr(self._io, 'seekable', None)) and self._io.seekable():
304+
# When a large number of bytes is requested, try to check first
305+
# that there is indeed enough data left in the stream.
306+
# This avoids reading large amounts of data only to notice afterwards
307+
# that it's not long enough. For smaller amounts of data, it's faster to
308+
# first read the data unconditionally and check the length afterwards.
309+
if (
310+
n >= 8*1024*1024 # = 8 MiB
311+
# in Python 2, there is a common error ['file' object has no
312+
# attribute 'seekable'], so we need to make sure that seekable() exists
313+
and callable(getattr(self._io, 'seekable', None))
314+
and self._io.seekable()
315+
):
307316
num_bytes_available = self.size() - self.pos()
308317
is_satisfiable = (n <= num_bytes_available)
309318

0 commit comments

Comments
 (0)