Skip to content

Enable async ASIC_DB writes for Southbound-ZMQ#1819

Open
inder-nexthop wants to merge 1 commit intosonic-net:masterfrom
nexthop-ai:async_southbound_zmq_writes_asic_db
Open

Enable async ASIC_DB writes for Southbound-ZMQ#1819
inder-nexthop wants to merge 1 commit intosonic-net:masterfrom
nexthop-ai:async_southbound_zmq_writes_asic_db

Conversation

@inder-nexthop
Copy link
Copy Markdown

@inder-nexthop inder-nexthop commented Mar 27, 2026

Why I did it

We are extending southbound ZMQ support (currently only used for DPU switches) to regular NPU switches for improved route offloading performance. However, PR #1694 introduced DisabledRedisClient that prevents syncd from writing ASIC state to Redis when ZMQ is enabled.

How I did it

Added DPU detection logic in syncd by reading switch_type from CONFIG_DB DEVICE_METADATA table.

Implemented logic for Redis client selection:

  • DPU + ZMQ: Use DisabledRedisClient (no Redis writes, maintains PR [syncd] Remove syncd redis objects if using ZMQ notifications #1694 behavior)
  • NPU + ZMQ (southbound ZMQ): Use RedisClient with async mode to persist ASIC state without blocking the ZMQ data path
  • No ZMQ: Use RedisClient in synchronous mode (traditional behavior)
    Extended RedisClient to support asynchronous mode using swss::AsyncDBUpdater for non-blocking Redis writes when ZMQ is enabled

How to verify it

Used the route benchmark script under sonic-mgmt/tests/scripts/route_programming_benchmark.py.
Also tested that the routes are written to ASIC_STATE table with a scale of 500k routes:

redis-cli -n 1 --scan --pattern "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY*" | wc -l
503255
**Test Policies**:

- Policy 1: Northbound ZMQ enabled
- Policy 2: Northbound ZMQ + Ring Buffer + Multi-DB enabled
- Policy 3: Northbound ZMQ + Southbound ZMQ enabled
- Policy 4: Northbound ZMQ + Southbound ZMQ + Multi-DB enabled

**Note: All tests are running with a SAI bulk size of 10k in Orchagent (-k 10000)**

| Route Scale | Policy 1 | Policy 2 | Policy 3 | Policy 4 |
| :---- | :---- | :---- | :---- | :---- |
| 100k | 13.713s | 14.649s | 14.633s | 12.962s |
| 250k | 54.563s | 46.197s | 50.120s | 41.602s |
| 500k | 187.923s | 131.901s | 129.681s | 102.961s |

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@inder-nexthop inder-nexthop force-pushed the async_southbound_zmq_writes_asic_db branch from 2711b27 to 18b7670 Compare March 30, 2026 18:21
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@vganesan-nokia
Copy link
Copy Markdown
Contributor

@inder-nexthop, this PR appears to be doing the same thing as done in the PR #1801. Would you please check?

@inder-nexthop
Copy link
Copy Markdown
Author

inder-nexthop commented Apr 1, 2026

@vganesan-nokia This PR is achieving the same goal, yes. This change is being made in syncd, extending the existing RedisClient to perform asynchronous ASIC_DB writes, while not blocking on the ack back and performing the write in ansynchronous fashion. Also accounting for the DPU vs NPU distinction to preserve PR #1694's behavior on DPU switches.

CC: @venkit-nexthop

Signed-off-by: Inder Pooni <inder@nexthop.ai>
@inder-nexthop inder-nexthop force-pushed the async_southbound_zmq_writes_asic_db branch from 18b7670 to d02b003 Compare April 2, 2026 05:18
@inder-nexthop inder-nexthop marked this pull request as ready for review April 2, 2026 06:03
@vganesan-nokia
Copy link
Copy Markdown
Contributor

vganesan-nokia commented Apr 6, 2026

@vganesan-nokia This PR is achieving the same goal, yes. This change is being made in syncd, extending the existing RedisClient to perform asynchronous ASIC_DB writes, while not blocking on the ack back and performing the write in ansynchronous fashion. Also accounting for the DPU vs NPU distinction to preserve PR #1694's behavior on DPU switches.

CC: @venkit-nexthop

Thanks for the response. Currently the ZmqRedisClient asynchronously writes the ASIC_STATE only. Writing of other informaiton like VIDTORID, RIDTOVID, COUNTERS, LANES, etc (other than ASIC_STATE) are still synchronously written in ASIC_DB in the hot path of route programming. In our testing of pr 1819 and comparing with pr 1801, we see slightly lower performance (about 4.5% degradation) when compared to pr 1801. We can try implementing async write of everything instead of just ASIC_STATE only and see if it improves.

@inder-nexthop
Copy link
Copy Markdown
Author

inder-nexthop commented Apr 6, 2026

@vganesan-nokia Thank you for testing this and bringing it to our attention. I will do some profiling and look into making the other operations that are on the hot path (VIDTORID, RIDTOVID, etc.) async as well. It would be great if you could help us test and compare once that change is in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

5 participants