Skip to content

Commit 7a0813a

Browse files
vvolamoleksandrivantsivgpunathilellashwnsrimlok-nokia
authored
[SmartSwitch] Enhance PCIe device check to skip the warning log, if device is in detaching mode (#546)
* Skip logging the warning, if device is in detaching mode * Add detach_info table and unittests * Fix unit tests * Increase code coverage * Remove unused header import * Fix dict get values * Increase code coverage * Increase test coverage * [SmartSwitch] Extend implementation of the DPU chassis daemon. (#563) * Addition of DPU Chassis for thermalctld (#564) * [stormond] Added new dynamic field 'last_sync_time' to STATE_DB (#535) * Added new dynamic field 'last_sync_time' that shows when STORAGE_INFO for disk was last synced to STATE_DB * Moved 'start' message to actual starting point of the daemon * Added functions for formatted and epoch time for user friendly time display * Made changes per prgeor review comments * Pivot to SysLogger for all logging * Increased log level so that they are seen in syslogs * Code coverage improvement * [lag_id] Add lagid to free_list when LC absent for 30 minutes (#542) When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST This PR works with the following 2 PRs: sonic-net/sonic-swss#3303 sonic-net/sonic-buildimage#20369 Signed-off-by: mlok <[email protected]> * Fixed bug in chassisd causing incorrect number of ASICs in CHASSIS_STATE_DB (#560) Fixed the bug in chassisd due to which incorrect number of ASICs were being pushed to CHASSIS_STATE_DB. * thermalctld: Add support for fans on non-CPU modules (#555) * thermalctld: Add support for fans on non-CPU modules * Add module fan to unit tests * Advanced Azure pipeline to Bookworm (#572) Description This PR advances the azure pipeline on sonic_platform_daemons from bullseye to bookworm. This fixes the issue where sonic-platform-daemons azp is having some issues due to upgrade to bookworm. See Pipelines - Run 20241210.8 logs for details. * Take non-CMIS xcvrs out of lpmode in SFF Manager (#565) Description Fix non-CMIS transceivers in down state by bringing them out of low power mode in the SFF Manager Task. This is intended to work together with the change in sonic-net/sonic-buildimage#20886. Motivation and Context Non-CMIS transceivers were not functioning correctly when put into Low Power mode. So XCVRD now brings them out of lpmode. How Has This Been Tested? Loaded an image containing this change alongside the change from sonic-net/sonic-buildimage#20886 on an Arista chassis containing a Clearwater2 linecard. Verified that without this image some interfaces were in a down state but with the image all interfaces came up as expected. * Added SmartSwitch support in chassisd and enabling chassisd (#467) Added SmartSwitch support in chassisd and enabling chassisd * [chassis][psud] Move the PSU parent information generation to the loop run function from the initialization function (#576) Description Move the PSU parent information generation to the loop run function from the initialization function Motivation and Context Fixes #575 How Has This Been Tested? Tested on Cisco chassis, the PHYSICAL_ENTITY_INFO|PSU * can be re-inserted after thermalctld restart. And monitored the stated db for memory for hours, works well: * [chassisd] Address the chassisd crash issue and add UT for it (#573) Description On Nokia platform, slot name of Supervisor is string "A" instead of a number. Using "int" to convert it could cause issue backtrace. We should use slot value to any checking without any conversion. This will fixes sonic-net/sonic-buildimage#21131 Motivation and Context Modify the _get_module_info not to convert "slot" to a string value. And also modify the code not to convert slot value to an to do any checking. Just directly use the returned value of get_slot(). Also add UT test_moduleupdater_check_slot_string() to valid it. How Has This Been Tested? Tested on 202405 branch Signed-off-by: mlok <[email protected]> * Fix a comment --------- Signed-off-by: mlok <[email protected]> Co-authored-by: Oleksandr Ivantsiv <[email protected]> Co-authored-by: Gagan Punathil Ellath <[email protected]> Co-authored-by: Ashwin Srinivasan <[email protected]> Co-authored-by: Marty Y. Lok <[email protected]> Co-authored-by: Vivek Verma <[email protected]> Co-authored-by: Patrick MacArthur <[email protected]> Co-authored-by: Peter Bailey <[email protected]> Co-authored-by: rameshraghupathy <[email protected]> Co-authored-by: Jianquan Ye <[email protected]>
1 parent c61323f commit 7a0813a

File tree

2 files changed

+110
-2
lines changed

2 files changed

+110
-2
lines changed

sonic-pcied/scripts/pcied

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@ SYSLOG_IDENTIFIER = "pcied"
2727
PCIE_RESULT_REGEX = "PCIe Device Checking All Test"
2828
PCIE_DEVICE_TABLE_NAME = "PCIE_DEVICE"
2929
PCIE_STATUS_TABLE_NAME = "PCIE_DEVICES"
30+
PCIE_DETACH_INFO_TABLE = "PCIE_DETACH_INFO"
31+
32+
PCIE_DETACH_BUS_INFO_FIELD = "bus_info"
33+
PCIE_DETACH_DPU_STATE_FIELD = "dpu_state"
3034

3135
PCIED_MAIN_THREAD_SLEEP_SECS = 60
3236

@@ -92,6 +96,7 @@ class DaemonPcied(daemon_base.DaemonBase):
9296
self.state_db = daemon_base.db_connect("STATE_DB")
9397
self.device_table = swsscommon.Table(self.state_db, PCIE_DEVICE_TABLE_NAME)
9498
self.status_table = swsscommon.Table(self.state_db, PCIE_STATUS_TABLE_NAME)
99+
self.detach_info = swsscommon.Table(self.state_db, PCIE_DETACH_INFO_TABLE)
95100

96101
def __del__(self):
97102
if self.device_table:
@@ -102,6 +107,10 @@ class DaemonPcied(daemon_base.DaemonBase):
102107
stable_keys = self.status_table.getKeys()
103108
for stk in stable_keys:
104109
self.status_table._del(stk)
110+
if self.detach_info:
111+
detach_info_keys = self.detach_info.getKeys()
112+
for dk in detach_info_keys:
113+
self.detach_info._del(dk)
105114

106115
# load aer-fields into statedb
107116
def update_aer_to_statedb(self):
@@ -151,6 +160,28 @@ class DaemonPcied(daemon_base.DaemonBase):
151160

152161
self.status_table.set("status", fvs)
153162

163+
# Check if any PCI interface is in detaching mode by querying the state_db
164+
def is_dpu_in_detaching_mode(self, pcie_dev):
165+
# Ensure detach_info is not None
166+
if self.detach_info is None:
167+
self.log_debug("detach_info is None")
168+
return False
169+
170+
# Query the state_db for the device detaching status
171+
detach_info_keys = list(self.detach_info.getKeys())
172+
if not detach_info_keys:
173+
return False
174+
175+
for key in detach_info_keys:
176+
dpu_info = self.detach_info.get(key)
177+
if dpu_info:
178+
bus_info = dpu_info.get(PCIE_DETACH_BUS_INFO_FIELD)
179+
dpu_state = dpu_info.get(PCIE_DETACH_DPU_STATE_FIELD)
180+
if bus_info == pcie_dev and dpu_state == "detaching":
181+
return True
182+
183+
return False
184+
154185
# Check the PCIe devices
155186
def check_pcie_devices(self):
156187
self.resultInfo = platform_pcieutil.get_pcie_check()
@@ -160,6 +191,14 @@ class DaemonPcied(daemon_base.DaemonBase):
160191

161192
for result in self.resultInfo:
162193
if result["result"] == "Failed":
194+
# Convert bus, device, and function to a bus_info format like "0000:03:00.0"
195+
pcie_dev = "0000:{int(result['bus'], 16):02x}:{int(result['dev'], 16):02x}.{int(result['fn'], 16)}"
196+
197+
# Check if the device is in detaching mode
198+
if device_info.is_smartswitch() and self.is_dpu_in_detaching_mode(pcie_dev):
199+
self.log_debug("PCIe Device: {} is in detaching mode, skipping warning.".format(pcie_dev))
200+
continue
201+
163202
self.log_warning("PCIe Device: " + result["name"] + " Not Found")
164203
err += 1
165204
else:

sonic-pcied/tests/test_DaemonPcied.py

Lines changed: 71 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,17 +143,86 @@ def test_run(self):
143143
daemon_pcied.run()
144144
assert daemon_pcied.check_pcie_devices.call_count == 1
145145

146+
@mock.patch('pcied.load_platform_pcieutil', mock.MagicMock())
147+
def test_is_dpu_in_detaching_mode(self):
148+
daemon_pcied = pcied.DaemonPcied(SYSLOG_IDENTIFIER)
149+
daemon_pcied.detach_info = mock.MagicMock()
150+
daemon_pcied.detach_info.getKeys = mock.MagicMock(return_value=['DPU_0', 'DPU_1'])
151+
daemon_pcied.detach_info.get = mock.MagicMock(
152+
side_effect=lambda key: {
153+
'DPU_0': {'bus_info': '0000:03:00.1', 'dpu_state': 'detaching'},
154+
'DPU_1': {'bus_info': '0000:03:00.2', 'dpu_state': 'attached'}
155+
}.get(key, None)
156+
)
157+
158+
# Test when the device is in detaching mode
159+
assert daemon_pcied.is_dpu_in_detaching_mode('0000:03:00.1') == True
160+
161+
# Test when the device is not in detaching mode
162+
assert daemon_pcied.is_dpu_in_detaching_mode('0000:03:00.2') == False
163+
164+
# Test when the device does not exist in detach_info
165+
assert daemon_pcied.is_dpu_in_detaching_mode('0000:03:00.3') == False
166+
167+
# Test when detach_info is None
168+
daemon_pcied.detach_info = None
169+
assert daemon_pcied.is_dpu_in_detaching_mode('0000:03:00.1') == False
170+
171+
# Test when detach_info has no keys
172+
daemon_pcied.detach_info = mock.MagicMock()
173+
daemon_pcied.detach_info.getKeys.return_value = []
174+
assert daemon_pcied.is_dpu_in_detaching_mode('0000:03:00.1') == False
175+
176+
@mock.patch('pcied.device_info.is_smartswitch', mock.MagicMock(return_value=False))
177+
@mock.patch('pcied.DaemonPcied.is_dpu_in_detaching_mode', mock.MagicMock(return_value=False))
146178
@mock.patch('pcied.load_platform_pcieutil', mock.MagicMock())
147179
def test_check_pcie_devices(self):
148180
daemon_pcied = pcied.DaemonPcied(SYSLOG_IDENTIFIER)
149181
daemon_pcied.update_pcie_devices_status_db = mock.MagicMock()
150182
daemon_pcied.check_n_update_pcie_aer_stats = mock.MagicMock()
151-
pcied.platform_pcieutil.get_pcie_check = mock.MagicMock()
183+
pcied.platform_pcieutil.get_pcie_check = mock.MagicMock(
184+
return_value=[
185+
{"result": "Failed", "bus": "03", "dev": "00", "fn": "1", "name": "PCIe Device 1"},
186+
]
187+
)
152188

153189
daemon_pcied.check_pcie_devices()
154190
assert daemon_pcied.update_pcie_devices_status_db.call_count == 1
155191
assert daemon_pcied.check_n_update_pcie_aer_stats.call_count == 0
156192

193+
@mock.patch('pcied.device_info.is_smartswitch', mock.MagicMock(return_value=False))
194+
@mock.patch('pcied.DaemonPcied.is_dpu_in_detaching_mode', mock.MagicMock(return_value=False))
195+
@mock.patch('pcied.load_platform_pcieutil', mock.MagicMock())
196+
def test_check_pcie_devices_update_aer(self):
197+
daemon_pcied = pcied.DaemonPcied(SYSLOG_IDENTIFIER)
198+
daemon_pcied.update_pcie_devices_status_db = mock.MagicMock()
199+
daemon_pcied.check_n_update_pcie_aer_stats = mock.MagicMock()
200+
pcied.platform_pcieutil.get_pcie_check = mock.MagicMock(
201+
return_value=[
202+
{"result": "Passed", "bus": "03", "dev": "00", "fn": "1", "name": "PCIe Device 1"},
203+
]
204+
)
205+
206+
daemon_pcied.check_pcie_devices()
207+
assert daemon_pcied.update_pcie_devices_status_db.call_count == 1
208+
assert daemon_pcied.check_n_update_pcie_aer_stats.call_count == 1
209+
210+
@mock.patch('pcied.device_info.is_smartswitch', mock.MagicMock(return_value=True))
211+
@mock.patch('pcied.DaemonPcied.is_dpu_in_detaching_mode', mock.MagicMock(return_value=True))
212+
@mock.patch('pcied.load_platform_pcieutil', mock.MagicMock())
213+
def test_check_pcie_devices_detaching(self):
214+
daemon_pcied = pcied.DaemonPcied(SYSLOG_IDENTIFIER)
215+
daemon_pcied.update_pcie_devices_status_db = mock.MagicMock()
216+
daemon_pcied.check_n_update_pcie_aer_stats = mock.MagicMock()
217+
pcied.platform_pcieutil.get_pcie_check = mock.MagicMock(
218+
return_value=[
219+
{"result": "Failed", "bus": "03", "dev": "00", "fn": "1", "name": "PCIe Device 1"},
220+
]
221+
)
222+
223+
daemon_pcied.check_pcie_devices()
224+
assert daemon_pcied.update_pcie_devices_status_db.call_count == 1
225+
assert daemon_pcied.check_n_update_pcie_aer_stats.call_count == 0
157226

158227
@mock.patch('pcied.load_platform_pcieutil', mock.MagicMock())
159228
def test_update_pcie_devices_status_db(self):
@@ -210,5 +279,5 @@ def test_update_aer_to_statedb(self):
210279
])
211280
"""
212281

213-
daemon_pcied.update_aer_to_statedb()
282+
daemon_pcied.update_aer_to_statedb()
214283
assert daemon_pcied.log_debug.call_count == 0

0 commit comments

Comments
 (0)