Skip to content

[vslib] Add High Frequency Telemetry (HFT) support for virtual SAI#1812

Draft
50n1c-rnsft wants to merge 20 commits intosonic-net:masterfrom
50n1c-rnsft:feature/vslib-hft-tam-ipfix
Draft

[vslib] Add High Frequency Telemetry (HFT) support for virtual SAI#1812
50n1c-rnsft wants to merge 20 commits intosonic-net:masterfrom
50n1c-rnsft:feature/vslib-hft-tam-ipfix

Conversation

@50n1c-rnsft
Copy link
Copy Markdown

@50n1c-rnsft 50n1c-rnsft commented Mar 25, 2026

Description

Add High Frequency Telemetry (HFT) / Stream Telemetry support to the virtual SAI (vslib), enabling the full HFT pipeline to run in virtual/test environments.

What this PR does

Task 1 — TAM object initialization:

  • Add SAI_SWITCH_ATTR_TAM_ST_REPORT_CHUNK_SIZE (default: 65535) and SAI_SWITCH_ATTR_TAM_ST_CHUNK_COUNT (default: 0) to set_initial_tam_objects()
  • Add SAI_OBJECT_TYPE_TAM_TEL_TYPE and SAI_OBJECT_TYPE_TAM_COUNTER_SUBSCRIPTION to supported object types
  • TAM_COUNTER_SUBSCRIPTION CRUD works through the existing generic create_internal() path

Task 2 — IPFIX template generation:

  • Replace placeholder random data in refresh_tam_tel_ipfix_templates() with real IPFIX template binary per HLD 7.2.2
  • Enterprise number encoding: (stat_id << 16) | object_type
  • Element ID: label | 0x8000 (enterprise bit set)
  • First field: observationTimeNanoseconds (Element ID=325, 8 bytes)
  • Label range validation with SWSS_LOG_WARN for values exceeding 15-bit range

Task 3 — STEL genetlink sender:

  • SwitchStateBaseStel.cpp: RAII GenlConnection wrapper using libnl-genl (genl_connect(), genl_ctrl_resolve(), genlmsg_put(), nla_put(), nl_send_auto())
  • Worker thread generates fake IPFIX data records at configurable intervals
  • Wire setTamTelType() to start/stop STEL stream on START_STREAM/STOP_STREAM state changes
  • Proper cleanup in ~SwitchStateBase() destructor

Task 4 — Stats capability extension:

  • Add queryBufferPoolStatsCapability() (15 stats) and queryIngressPriorityGroupStatsCapability() (9 stats)
  • Extend queryStatsCapability() to dispatch BUFFER_POOL and INGRESS_PRIORITY_GROUP object types

Files changed (14 files, +1089/-156 lines)

File Description
vslib/SwitchStateBase.cpp Task 1 + Task 2 + Task 4 + destructor fix
vslib/SwitchStateBase.h New method declarations + STEL thread members
vslib/SwitchStateBaseStel.cpp New — STEL genetlink sender + worker thread
vslib/sonic_stel_uapi.h New — UAPI header (shared with kernel module)
vslib/Makefile.am Add -lnl-genl-3 -lnl-3 link flags
vslib/tests.cpp Update expected supported object type count (8→10)
syncd/Makefile.am Add -lnl-genl-3 -lnl-3 to SAILIB
syncd/tests/Makefile.am Add -lnl-genl-3 -lnl-3 link flags
saidiscovery/Makefile.am Add -lnl-genl-3 -lnl-3 to SAILIB
saisdkdump/Makefile.am Add -lnl-genl-3 -lnl-3 to SAILIB
tests/Makefile.am Add -lnl-genl-3 -lnl-3 to SAILIB
tests/aspell.en.pws Add new words (libnl, syscall, templatehdr)
unittest/vslib/TestTAMIpfixTemplate.cpp New — 4 unit tests for IPFIX template generation
unittest/vslib/Makefile.am Add test file

Kernel module

The sonic_stel kernel module (genetlink family registration) is in a separate PR:

Testing

  • Build: dpkg-buildpackage -Psyncd,vs,nopython2 passes in sonic-slave-bookworm:master-amd64 CI container
  • Unit tests: 203/203 passed (including 4 new TAMIpfixTemplateTest cases)
    • EmptySubscription_GeneratesMinimalTemplate
    • SingleSubscription_CorrectBinaryFormat
    • MultipleSubscriptions_CorrectFieldCountAndEncoding
    • UnmatchedSubscription_Filtered

References

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Mar 25, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@50n1c-rnsft 50n1c-rnsft force-pushed the feature/vslib-hft-tam-ipfix branch from 9fb21e0 to f73d1ed Compare March 25, 2026 01:25
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@50n1c-rnsft 50n1c-rnsft force-pushed the feature/vslib-hft-tam-ipfix branch from 7621c8f to 76dc23e Compare March 25, 2026 03:16
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@50n1c-rnsft 50n1c-rnsft changed the title [vslib] Add HFT TAM object support and IPFIX template generation [vslib] Add High Frequency Telemetry (HFT) support for virtual SAI Mar 25, 2026
- Add 'libnl', 'syscall', 'templatehdr' to tests/aspell.en.pws
- Add SWSS_LOG_ENTER() to test fixture methods in TestTAMIpfixTemplate.cpp
  (SetUp, getIpfixTemplates, createCounterSubscription, validateTemplateHeader)
- Add swss/logger.h include to test file


Signed-off-by: Ze Gan agent <ganze_12345@qq.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@50n1c-rnsft 50n1c-rnsft force-pushed the feature/vslib-hft-tam-ipfix branch from 6c00dcd to 17862f1 Compare March 25, 2026 14:55
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

50n1c-rnsft added a commit to 50n1c-rnsft/sonic-buildimage that referenced this pull request Mar 25, 2026
Add a minimal generic netlink kernel module (sonic_stel) for the VS
platform that registers the 'sonic_stel' family with an 'ipfix'
multicast group.

This module acts as a relay: vslib (virtual SAI) sends IPFIX data
records via SONIC_STEL_CMD_SEND_IPFIX, and the module multicasts
them to countersyncd via genlmsg_multicast.

Files:
- platform/vs/sonic-stel-module/ - kernel module source + debian packaging
- platform/vs/sonic-stel-ko.mk - buildimage rule

Related PR: sonic-net/sonic-sairedis#1812

Signed-off-by: Ze Gan agent <ganze_12345@qq.com>
@50n1c-rnsft 50n1c-rnsft force-pushed the feature/vslib-hft-tam-ipfix branch from 17862f1 to 64d2196 Compare March 25, 2026 15:01
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@50n1c-rnsft 50n1c-rnsft force-pushed the feature/vslib-hft-tam-ipfix branch from 64d2196 to 5527bb2 Compare March 25, 2026 15:03
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

50n1c-rnsft added a commit to 50n1c-rnsft/sonic-buildimage that referenced this pull request Mar 25, 2026
Add a minimal generic netlink kernel module (sonic_stel) for the VS
platform that registers the 'sonic_stel' family with an 'ipfix'
multicast group.

This module acts as a relay: vslib (virtual SAI) sends IPFIX data
records via SONIC_STEL_CMD_SEND_IPFIX, and the module multicasts
them to countersyncd via genlmsg_multicast.

Files:
- platform/vs/sonic-stel-module/ - kernel module source + debian packaging
- platform/vs/sonic-stel-ko.mk - buildimage rule

Related PR: sonic-net/sonic-sairedis#1812

Signed-off-by: Ze Gan agent <ganze_12345@qq.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

- Add 'genl' and 'ko' to tests/aspell.en.pws
- Add '// SWSS_LOG_ENTER() omitted' comments to static helper functions:
  - SwitchStateBaseStel.cpp: put_u16_be, put_u32_be, put_u64_be, build_ipfix_data_message
  - SwitchStateBase.cpp: write_u16_be, write_u32_be
  - TestTAMIpfixTemplate.cpp: read_u16_be, read_u32_be

Signed-off-by: Ze Gan agent <ganze_12345@qq.com>
@50n1c-rnsft 50n1c-rnsft force-pushed the feature/vslib-hft-tam-ipfix branch from 186e94e to 2c80d87 Compare March 25, 2026 15:46
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +572 to +624
switch (attr->value.s32)
{
case SAI_TAM_TEL_TYPE_STATE_CREATE_CONFIG:
send_tam_tel_type_config_change(tam_tel_type_id);
break;

case SAI_TAM_TEL_TYPE_STATE_START_STREAM:
{
// Count counter subscriptions for this tel_type to determine num_counters
size_t num_counters = 0;
auto it = m_objectHash.find(SAI_OBJECT_TYPE_TAM_COUNTER_SUBSCRIPTION);
if (it != m_objectHash.end())
{
for (auto &kv : it->second)
{
auto tel_it = kv.second.find(sai_serialize_attr_id(
*sai_metadata_get_attr_metadata(
SAI_OBJECT_TYPE_TAM_COUNTER_SUBSCRIPTION,
SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_TEL_TYPE)));

if (tel_it != kv.second.end() &&
tel_it->second->getAttr()->value.oid == tam_tel_type_id)
{
num_counters++;
}
}
}

// Default poll interval: 100ms (100000 us)
// In real HW this comes from TAM report config; for vslib use a reasonable default
uint32_t poll_interval_us = 100000;
uint16_t template_id = 256;

SWSS_LOG_NOTICE("Starting STEL stream for tel_type %s: %zu counters, %u us interval",
sai_serialize_object_id(tam_tel_type_id).c_str(),
num_counters, poll_interval_us);

if (num_counters > 0)
{
startStelStream(poll_interval_us, template_id, num_counters);
}
break;
}

case SAI_TAM_TEL_TYPE_STATE_STOP_STREAM:
SWSS_LOG_NOTICE("Stopping STEL stream for tel_type %s",
sai_serialize_object_id(tam_tel_type_id).c_str());
stopStelStream();
break;

default:
break;
}

Check notice

Code scanning / CodeQL

Long switch case Note

Switch has at least one case that is too long:
SAI_TAM_TEL_TYPE_STATE_START_STREAM (37 lines)
.
Per RFC 7011, an IPFIX Exporting Process must send Template Records
before any Data Records. The STEL worker thread was only sending
Data Records (Set ID=256), causing countersyncd to fail with:
  'Failed to parse IPFIX data message: Error: Missing Template at 0x14'

Fix:
- Add build_ipfix_template_message() to wrap template set in IPFIX header
- Read IPFIX template from SAI_TAM_TEL_TYPE_ATTR_IPFIX_TEMPLATES on
  START_STREAM and pass to worker thread
- Send Template Record before first Data Record
- Re-send Template Record every ~30 seconds per RFC 7011 recommendation
- Update startStelStream/stelWorkerThread signatures to accept template

Tested: 203/203 unit tests pass, aspell/swsslogenter clean.
Bug found by msft-internal-linux during VS testbed integration testing.

Signed-off-by: Ze Gan agent <ganze_12345@qq.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Revert commit af43e00. Per Ze's clarification, IPFIX templates
are NOT sent via the data plane (genetlink). Instead:
  1. orchagent queries SAI_TAM_TEL_TYPE_ATTR_IPFIX_TEMPLATES
  2. orchagent writes template to STATE_DB (session_config field)
  3. countersyncd reads template from STATE_DB

The genetlink channel only carries IPFIX Data Records.
The 'Missing Template' error from countersyncd testing was because
orchagent had not yet written the template to STATE_DB, not because
vslib needed to send it.

Signed-off-by: Ze Gan agent <ganze_12345@qq.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Ze Gan agent <ganze_12345@qq.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants