2️⃣ Read IO Improvements #429
Replies: 6 comments 4 replies
-
|
Watching videos on Postgres approach, they appear to be implementing async IO (AIO) for storage, citing cloud storages highly parallel read as a factor in that approach. This is in concert with a threading approach which is unlikely to be able to be replicated with Python due to the GIL. Will need multi processing and the costs associated with that, but with AIO. |
Beta Was this translation helpful? Give feedback.
-
|
Create a truer implementation of the buffer pool, refer to the existing implementation as an external buffer (even in memory) Use plasma (?) To reserve a chunk of memory, populate the buffer on read, keep a directory of pages in the buffer. When we have a scan plan, read the pages from the pool first. Prefetch pages from remote storage into the pool. |
Beta Was this translation helpful? Give feedback.
-
|
GCS supports the S3 API, I recommend this is the focus of proving optimisations. Where possible these should be ported to the GCS functionality. This may be the rust S3 reader. |
Beta Was this translation helpful? Give feedback.
-
|
Using The MinIo reader was able to saturate my 2.5gbps connection (2.3gbps sustained), so may be faster on a faster network. This is lab testing and will not quite map to real implementation, the lab approach downloads the files as fast as possible, which if was a large dataset, would blow memory allocations. An approach is needed to buffer results and to wait when there is no room in the buffer. |
Beta Was this translation helpful? Give feedback.
-
|
I wonder what writing a sidecar system which handles IO would do for performance. Running as a separate process, using multiprocessing, the new part could be optimised for getting data, handling caching, buffering and decoding data, for the query engine component to consume from. |
Beta Was this translation helpful? Give feedback.
-
|
Given the massive disproportionate execution time for read and process (let's say 90% of execution time is reading - different factors affect this so there's no hard number), we cannot read as fast as we process without significant changes. Doing work like parallel reading, takes a lot of effort but at most will make the download the critical path rather than read/process/read/process, so we're saving about 10%. Caching and buffer pool have helped considerably, and we've seen that projection pushdown saves a lot of the read time, but selection pushdown less so, it may even be slower. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Current deployments using GCS as the main store regularly see the cost of read (planning and scanning) occupy over 90% (99.9% observed) of the query time. Using Memcache reduces read time, usually in the region of 50%, but this can still mean over 90% of the query time is read (planning and scanning).
The current implementation of planning is broadly:
The current implementation of reading is serial, read a blob, send it for processing, wait for the next blob to be requested. Entire blobs are read at a time, regardless of the amount of data needed (for example, even if using projection pushdown on Parquet, the entire blob is downloaded at a time).
Some experimentation has been done with multi-processing, this is immature but has not shown good improvements to read times (reduction of in the region of 10% execution time - note read time is the same, but in parallel so measuring elapsed time)
Beta Was this translation helpful? Give feedback.
All reactions