read_bytes(): use previous implementation again for small reads

dgelessus · generalmimon · generalmimon · commit c29609940a4c · 2022-07-07T01:32:01.000+02:00
For small reads, the new code that tries to avoid unnecessary reads is
noticeably slower than the previous code that reads unconditionally. In
the worst case (1-byte reads), the new code is 13 times as slow as the
previous implementation. The potential memory/IO savings only become
worth it for larger reads, where the performance difference disappears.

Co-authored-by: Petr Pucil &lt;petr.pucil@seznam.cz&gt;
diff --git a/kaitaistruct.py b/kaitaistruct.py
@@ -301,9 +301,18 @@ def read_bytes(self, n):
             )
 
         is_satisfiable = True
-        # in Python 2, there is a common error ['file' object has no
-        # attribute 'seekable'], so we need to make sure that seekable() exists
-        if callable(getattr(self._io, 'seekable', None)) and self._io.seekable():
+        # When a large number of bytes is requested, try to check first
+        # that there is indeed enough data left in the stream.
+        # This avoids reading large amounts of data only to notice afterwards
+        # that it's not long enough. For smaller amounts of data, it's faster to
+        # first read the data unconditionally and check the length afterwards.
+        if (
+            n >= 8*1024*1024  # = 8 MiB
+            # in Python 2, there is a common error ['file' object has no
+            # attribute 'seekable'], so we need to make sure that seekable() exists
+            and callable(getattr(self._io, 'seekable', None))
+            and self._io.seekable()
+        ):
             num_bytes_available = self.size() - self.pos()
             is_satisfiable = (n <= num_bytes_available)