-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHAIN_DMA: fix memory leaks #8767
Conversation
The CHAIN_DMA code relies on the component code unfortunately without a symmetry between memory allocation and release. ipc4_chain_manager_create() relies on the ipc4_add_comp_dev() helper, which internally allocate 20 bytes for an icd item included in a list. ipc4_chain_dma_state() directly releases the component, without removing the icd item from the list. This slow memory leak, combined with alignment requirements, gradually prevents a DMA buffer from being allocated. This patch adds a search in the list and frees the item. With this modification the CHAIN_DMA stress tests can loop forever. Closes: thesofproject#8751 Signed-off-by: Pierre-Louis Bossart <[email protected]>
CONFIG_IPC4_XRUN_NOTIFICATIONS_ENABLE does not exist, replace by valid Kconfig. This is just a search and replace patch, it's quite likely this Kconfig option is bit-rot quality level since it was NEVER compiled in. Signed-off-by: Pierre-Louis Bossart <[email protected]>
A code analysis shows a tentative missing ipc_msg_fre() leading to a memory leak. This problem was found with a code analysis - the actual code with XRUN_NOTIFICATIONS_ENABLE was NOT tested. Signed-off-by: Pierre-Louis Bossart <[email protected]>
Any rough idea which code introduced this leak and when? Our failure rate jumped up recently and mysteriously... EDIT: is chained DMA tested at all in validation? No idea sorry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart, I wonder how this was not discovered earlier..
list_item_del(&icd->list); | ||
rfree(icd); | ||
break; | ||
} | ||
comp_free(dev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would ipc_comp_free()
work instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it based on the input from @kv2019i, but that routine takes a comp_id parameter that I can't really understand, and it does a lot more things.
With this PR I opted for the bare minimum of releasing the memory that was allocated and cleaning up the linked list, but yeah it's not necessarily final. I have no idea if locking is necessary, if the multi-core stuff is required, etc.
That's really for firmware experts to comment on, all I did was root-cause an imbalance and a memory leak. I am more than happy if someone has a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question @ujfalusi . This is using the component infra in a creative way. I took another read through the code and I think this still correct. The chain-dma is not setting up a full pipeline, so the actions in ipc_comp_free() are not really applicable. Maybe @lyakh can take a quick look at this as well?
lack of test coverage: I don't think anyone looks at the max_allocated statistic report and checks if it grows between iterations of a test cycle.... It's pretty scary to be honest, we really ought to have an sof-test check on stuff like this. It's also my legendary bad luck, I break stuff that appears perfectly fine at the surface :-) More specifically my test involved TWO chain_dmas at the same time, each with a different but large FIFO size and with a intertwined set of init/trigger/release calls that is probably not seen when doing a single playback forever. Even when I tested the playback/capture concurrently with a script, it's likely that the order is preserved and the memory leak has less of an impact. |
@marc-hb @plbossart et al. Confirmed the leak does occur with HDMI test. One needs to keep the DSP alive and do a lot of HDMI start/stops on the side. Following script is a small tool to trigger:
But this will take a long time to end up in a crash (50000+ iterations). A test using heap tracking would be more effective way to catch these in CI. We do have CONFIG_SYS_HEAP_RUNTIME_STATS=y in the perf-overlay and that does catch this, but you do need a specific test that does a lot of iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with a different HDMI based test and got the leak reproduced, and verified this PR fixes the leak, so I think this is good to go (given CI passes).
list_item_del(&icd->list); | ||
rfree(icd); | ||
break; | ||
} | ||
comp_free(dev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question @ujfalusi . This is using the component infra in a creative way. I took another read through the code and I think this still correct. The chain-dma is not setting up a full pipeline, so the actions in ipc_comp_free() are not really applicable. Maybe @lyakh can take a quick look at this as well?
list_item_del(&icd->list); | ||
rfree(icd); | ||
break; | ||
} | ||
comp_free(dev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it should be a one-liner
- comp_free(dev);
+ ipc_comp_free(dev);
but one would need to check carefully that that function cannot do any harm. @plbossart could you maybe give it a quick test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: the API seems to be asymmetric. There's comp_free()
and there's ipc_comp_free()
but I don't think anything like comp_alloc()
, the closest I find is ipc4_add_comp_dev()
and that's what chain DMA is using. So in the short run, I guess - whatever works, in the long run someone might want to clean that up a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple issues:
a) int ipc_comp_free(struct ipc *ipc, uint32_t comp_id)
what is the comp_id?
b) if (!icd->cd->bsource_list.next || !icd->cd->bsink_list.next)
return -EINVAL;
we don't have any such things defined for chain-dma, do we? My guess is that we would hit this error case.
I can give it a try once you guys help me with a)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets take this as a followup improvement, this is fixing a leak and unblocking tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes an obvious bug. The added code could of course be written in a new function like ipc4_delete_comp_dev(), but since there is no other obvious user this is probably just fine.
list_for_item_safe(clist, _tmp, &ipc->comp_list) { | ||
icd = container_of(clist, struct ipc_comp_dev, list); | ||
if (icd->cd != dev) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this would make the resulting code more optimal, but this could be
list_for_item_safe(clist, _tmp, &ipc->comp_list) {
if (clist != &dev->list)
continue;
icd = ...
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, but most uses of list_for_item_safe start with the container_of stuff.
I don't mind doing the change if that was the preference, I prefer code that looks familiar and consistent with other usages.
git grep -A3 list_for_item_safe
--
src/audio/module_adapter/module/generic.c: list_for_item_safe(mem_list, _mem_list, &mod->priv.memory.mem_list) {
src/audio/module_adapter/module/generic.c- mem = container_of(mem_list, struct module_memory, mem_list);
src/audio/module_adapter/module/generic.c- if (mem->ptr == ptr) {
src/audio/module_adapter/module/generic.c- rfree(mem->ptr);
--
src/audio/module_adapter/module/generic.c: list_for_item_safe(mem_list, _mem_list, &mod->priv.memory.mem_list) {
src/audio/module_adapter/module/generic.c- mem = container_of(mem_list, struct module_memory, mem_list);
src/audio/module_adapter/module/generic.c- rfree(mem->ptr);
src/audio/module_adapter/module/generic.c- list_item_del(&mem->mem_list);
--
src/audio/module_adapter/module_adapter.c: list_for_item_safe(dp_queue_list_item, tmp, &mod->dp_queue_dp_to_ll_list) {
src/audio/module_adapter/module_adapter.c- struct dp_queue *dp_queue =
src/audio/module_adapter/module_adapter.c- container_of(dp_queue_list_item, struct dp_queue, list);
src/audio/module_adapter/module_adapter.c-
--
src/audio/module_adapter/module_adapter.c: list_for_item_safe(dp_queue_list_item, tmp, &mod->dp_queue_ll_to_dp_list) {
src/audio/module_adapter/module_adapter.c- struct dp_queue *dp_queue =
src/audio/module_adapter/module_adapter.c- container_of(dp_queue_list_item, struct dp_queue, list);
src/audio/module_adapter/module_adapter.c-
--
src/audio/module_adapter/module_adapter.c: list_for_item_safe(blist, _blist, &mod->sink_buffer_list) {
src/audio/module_adapter/module_adapter.c- struct comp_buffer *buffer = container_of(blist, struct comp_buffer,
src/audio/module_adapter/module_adapter.c- sink_list);
src/audio/module_adapter/module_adapter.c- uint32_t flags;
--
src/audio/module_adapter/module_adapter.c: list_for_item_safe(dp_queue_list_item, tmp, &mod->dp_queue_dp_to_ll_list) {
src/audio/module_adapter/module_adapter.c- struct dp_queue *dp_queue =
src/audio/module_adapter/module_adapter.c- container_of(dp_queue_list_item, struct dp_queue, list);
src/audio/module_adapter/module_adapter.c-
--
src/audio/module_adapter/module_adapter.c: list_for_item_safe(dp_queue_list_item, tmp, &mod->dp_queue_ll_to_dp_list) {
src/audio/module_adapter/module_adapter.c- struct dp_queue *dp_queue =
src/audio/module_adapter/module_adapter.c- container_of(dp_queue_list_item, struct dp_queue, list);
src/audio/module_adapter/module_adapter.c-
--
src/audio/module_adapter/module_adapter.c: list_for_item_safe(blist, _blist, &mod->sink_buffer_list) {
src/audio/module_adapter/module_adapter.c- struct comp_buffer *buffer = container_of(blist, struct comp_buffer,
src/audio/module_adapter/module_adapter.c- sink_list);
src/audio/module_adapter/module_adapter.c- uint32_t flags;
--
src/ipc/ipc-helper.c: list_for_item_safe(clist, tmp, &icd->cd->bsource_list) {
src/ipc/ipc-helper.c- struct comp_buffer *buffer = container_of(clist, struct comp_buffer, sink_list);
src/ipc/ipc-helper.c-
src/ipc/ipc-helper.c- buffer->sink = NULL;
--
src/ipc/ipc-helper.c: list_for_item_safe(clist, tmp, &icd->cd->bsink_list) {
src/ipc/ipc-helper.c- struct comp_buffer *buffer = container_of(clist, struct comp_buffer, source_list);
src/ipc/ipc-helper.c-
src/ipc/ipc-helper.c- buffer->source = NULL;
--
src/ipc/ipc4/helper.c: list_for_item_safe(list, _list, &icd->cd->bsink_list) {
src/ipc/ipc4/helper.c- struct comp_dev *sink;
src/ipc/ipc4/helper.c-
src/ipc/ipc4/helper.c- buffer = container_of(list, struct comp_buffer, source_list);
--
src/ipc/ipc4/helper.c: list_for_item_safe(list, _list, &icd->cd->bsource_list) {
src/ipc/ipc4/helper.c- struct comp_dev *source;
src/ipc/ipc4/helper.c-
src/ipc/ipc4/helper.c- buffer = container_of(list, struct comp_buffer, sink_list);
--
src/lib/notifier.c: list_for_item_safe(wlist, tlist, ¬ify->list[type]) {
src/lib/notifier.c- handle = container_of(wlist, struct callback_handle, list);
src/lib/notifier.c- if ((!receiver || handle->receiver == receiver) &&
src/lib/notifier.c- (!caller || handle->caller == caller)) {
--
src/lib/notifier.c: list_for_item_safe(wlist, tlist, ¬ify->list[type]) {
src/lib/notifier.c- handle = container_of(wlist, struct callback_handle, list);
src/lib/notifier.c- if (!caller || !handle->caller || handle->caller == caller)
src/lib/notifier.c- handle->cb(handle->receiver, type, data);
--
src/library_manager/lib_notification.c: list_for_item_safe(list_elem, list_tmp, &lib_notif->list) {
src/library_manager/lib_notification.c- msg_pool_elem = container_of(list_elem, struct ipc_lib_msg, list);
src/library_manager/lib_notification.c- assert(msg_pool_elem->msg);
src/library_manager/lib_notification.c-
--
src/platform/library/schedule/ll_schedule.c: list_for_item_safe(tlist, tlist_, &sched_list) {
src/platform/library/schedule/ll_schedule.c- task = container_of(tlist, struct task, list);
src/platform/library/schedule/ll_schedule.c-
src/platform/library/schedule/ll_schedule.c- /* only run queued tasks */
--
src/schedule/zephyr_ll.c: list_for_item_safe(list, tmp, &task_head) {
src/schedule/zephyr_ll.c- list_item_del(list);
src/schedule/zephyr_ll.c- list_item_append(list, &sch->tasks);
src/schedule/zephyr_ll.c- }
--
test/cmocka/src/notifier_mocks.c: list_for_item_safe(wlist, tlist, ¬ify->list[type]) {
test/cmocka/src/notifier_mocks.c- handle = container_of(wlist, struct callback_handle, list);
test/cmocka/src/notifier_mocks.c- if (!caller || !handle->caller || handle->caller == caller)
test/cmocka/src/notifier_mocks.c- handle->cb(handle->receiver, type, data);
--
test/cmocka/src/notifier_mocks.c: list_for_item_safe(wlist, tlist, ¬ify->list[type]) {
test/cmocka/src/notifier_mocks.c- handle = container_of(wlist, struct callback_handle, list);
test/cmocka/src/notifier_mocks.c- if ((!receiver || handle->receiver == receiver) &&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add that speed isn't really a problem here, tearing down a chain DMA requires participation from the host and firmware, saving a couple of cycles to scan the component list isn't going to make any difference.
The SoundWire BPT/BRA loop tests exposes a DMA buffer allocation issue, which was root-caused after 4 days of my life that I am not getting back to a 20-byte memory leak. After a while this causes the DMA buffer to fail, likely because of alignment/fragmentation.
This PR also suggests fixes for the XRUN_NOTIFICATIONS_ENABLE option, but since this Kconfig is not used it's anyone's guess if the code actually works. Our firmware overlords might decide to completely remove this bad case of bitrot.