Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server] Allocated a dedicated thread pool for ingestion DB lookup #1525

Conversation

gaojieliu
Copy link
Contributor

Based on the prod experiment, the SSD being used is not performing well when enabling parallel processing for AA/WC workload as the database lookup concurrency will be roughly same as the concurrency of parallel processing and we saw a higher lookup latency with high cpu wait.
This PR introduces a separated thread pool for ingestion database lookup (value and RMD) and the concurrency can be controlled separately from the parallel processing concurrency, and based on a canary test, a lower concurrency of DB lookup can reduce cpu wait while delivering a slightly better overall throughput.

New Config:
server.aa.wc.ingestion.storage.lookup.thread.pool.size: default 4

How was this PR tested?

CI

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

Based on the prod experiment, the SSD being used is not performing
well when enabling parallel processing for AA/WC workload as the
database lookup concurrency will be roughly same as the concurrency
of parallel processing and we saw a higher lookup latency with high
cpu wait.
This PR introduces a separated thread pool for ingestion database
lookup (value and RMD) and the concurrency can be controlled
separately from the parallel processing concurrency, and based on
a canary test, a lower concurrency of DB lookup can reduce cpu wait
while delivering a slightly better overall throughput.

New Config:
server.aa.wc.ingestion.storage.lookup.thread.pool.size: default 4
@gaojieliu gaojieliu force-pushed the make_ingestion_lookup_concurrency_configurable branch from 1e89de6 to a23ae99 Compare February 12, 2025 17:25
@sushantmane
Copy link
Collaborator

I haven’t reviewed this PR, but I want to share an idea we might want to consider. IMO, we should have a single common thread pool for RocksDB reads, used for both the read and write paths. This would allow better concurrency control: reads for writes can be made to have lower priority than user reads. This way, we can manage read concurrency more effectively without impacting regular reads.

@gaojieliu
Copy link
Contributor Author

I haven’t reviewed this PR, but I want to share an idea we might want to consider. IMO, we should have a single common thread pool for RocksDB reads, used for both the read and write paths. This would allow better concurrency control: reads for writes can be made to have lower priority than user reads. This way, we can manage read concurrency more effectively without impacting regular reads.

The main goal of this PR is not trying to solve the conflict between read and write, but just write only.
There is another initiative to address the priority issue, which is called dynamic throttling since the read path can be affected by the whole ingestion pipeline, not just the ingestion read.
Even without this enhancement, the read latency with batch processing is comparable or slightly better than other nodes, and this is mainly to address the high cpu wait issue.

@sushantmane
Copy link
Collaborator

The main goal of this PR is not trying to solve the conflict between read and write, but just write only.

I understand that. However, these concerns are not mutually exclusive. We need to take a comprehensive look at the storage layer's IO impact. If we focus solely on the write path without considering the read path, the overall improvement may still be suboptimal.

These are just my thoughts—this isn't a blocker for this PR.

@gaojieliu
Copy link
Contributor Author

The main goal of this PR is not trying to solve the conflict between read and write, but just write only.

I understand that. However, these concerns are not mutually exclusive. We need to take a comprehensive look at the storage layer's IO impact. If we focus solely on the write path without considering the read path, the overall improvement may still be suboptimal.

These are just my thoughts—this isn't a blocker for this PR.

Certainly, but another important aspect is to separate of the concerns, and different layers needs to be relatively independent and there will be a coordination layer, which will orchestrate the read and write, which is the dynamic throttling I was referring to in the previous comment.

@sushantmane
Copy link
Collaborator

Certainly, but another important aspect is to separate of the concerns, and different layers needs to be relatively independent and there will be a coordination layer, which will orchestrate the read and write, which is the dynamic throttling I was referring to in the previous comment.

We do't need co-ordination layer for the idea I was referring to.

@gaojieliu
Copy link
Contributor Author

Certainly, but another important aspect is to separate of the concerns, and different layers needs to be relatively independent and there will be a coordination layer, which will orchestrate the read and write, which is the dynamic throttling I was referring to in the previous comment.

We do't need co-ordination layer for the idea I was referring to.

I think we do since the read/write path are not just competing IO resources, but all others, such as CPU, GC, network, and that is the holistic view we should look at instead of this narrow scope.
One example is that rebootstrapping can affect read perf even all the writes are full put.

@sixpluszero
Copy link
Contributor

I haven’t reviewed this PR, but I want to share an idea we might want to consider. IMO, we should have a single common thread pool for RocksDB reads, used for both the read and write paths. This would allow better concurrency control: reads for writes can be made to have lower priority than user reads. This way, we can manage read concurrency more effectively without impacting regular reads.

Can you share how you going to control the priority of write-path read and regular read within single pool? My understanding is unless you control/throttle the issuer of the read (the consumer thread) you cannot directly control that right? And then if this is the case then it can also be achieved by separate pool then..

@sushantmane
Copy link
Collaborator

sushantmane commented Feb 12, 2025

Can you share how you going to control the priority of write-path read and regular read within single pool? My understanding is unless you control/throttle the issuer of the read (the consumer thread) you cannot directly control that right? And then if this is the case then it can also be achieved by separate pool then..

All read requests to RocksDB will be routed through the storage executor thread pool, which will be updated to use a priority-based queue for scheduling tasks. Requests from the read path will be assigned a higher priority than those triggered by ingestion, with additional safeguards to prevent ingestion reads from being indefinitely starved. This approach implicitly throttles writes, as ingestion will be blocked until queued reads are not finished.

@sixpluszero
Copy link
Contributor

All read requests to RocksDB will be routed through the storage executor thread pool, which will be updated to use a priority-based queue for scheduling tasks. Requests from the read path will be assigned a higher priority than those triggered by ingestion, with additional safeguards to prevent ingestion reads from being indefinitely starved. This approach implicitly throttles writes, as ingestion will be blocked until queued reads are not finished.

It can be promising, but I have a concern of starving the write-path read though. Maybe if we configure the thread pool large enough then we can avoid that.

@sushantmane
Copy link
Collaborator

sushantmane commented Feb 12, 2025

It can be promising, but I have a concern of starving the write-path read though. Maybe if we configure the thread pool large enough then we can avoid that.

Could you please elaborate? Do you think time-based capping (or max delay) won't prevent starving?

@gaojieliu gaojieliu merged commit bb092b9 into linkedin:main Feb 12, 2025
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants