Improving scalability #112
joelfiddes
started this conversation in
Ideas
Replies: 1 comment
-
Good to know! Combined with Zarr files we'll make TPS go faster :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
compute time seems to increase non-linearly (exponentially?) with cluster number. Indicating some issue with scalability. Here some interesting comparison btween libaries, multiprocessing currentlz used:
ThreadPoolExecutor:
Best for I/O-bound tasks (e.g., file downloads, API calls, disk operations).
Threads can effectively handle multiple I/O operations since the GIL is released during I/O waits.
Threads share memory space, making it easier to share resources between tasks without serialization.
multiprocessing:
Best for CPU-bound tasks (e.g., heavy computations, data processing).
Each process has its own GIL and memory space, avoiding contention but adding inter-process communication (IPC) overhead.
For I/O-bound tasks, multiprocessing can be overkill, as creating separate processes and managing their memory incurs additional overhead.
Advantage of ThreadPoolExecutor: Threads perform better for I/O-bound tasks like downloads because they can overlap I/O waits without the overhead of process creation.
Does it indicate ThreadPool might be more suitable for our I/O constrained tasks? If indeed they are I/O constrained
Beta Was this translation helpful? Give feedback.
All reactions