Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHAIN_DMA: fix memory leaks #8767

Merged
merged 3 commits into from
Jan 24, 2024

Conversation

plbossart
Copy link
Member

The SoundWire BPT/BRA loop tests exposes a DMA buffer allocation issue, which was root-caused after 4 days of my life that I am not getting back to a 20-byte memory leak. After a while this causes the DMA buffer to fail, likely because of alignment/fragmentation.

This PR also suggests fixes for the XRUN_NOTIFICATIONS_ENABLE option, but since this Kconfig is not used it's anyone's guess if the code actually works. Our firmware overlords might decide to completely remove this bad case of bitrot.

The CHAIN_DMA code relies on the component code unfortunately without
a symmetry between memory allocation and release.

ipc4_chain_manager_create() relies on the ipc4_add_comp_dev() helper,
which internally allocate 20 bytes for an icd item included in a list.

ipc4_chain_dma_state() directly releases the component, without
removing the icd item from the list.

This slow memory leak, combined with alignment requirements, gradually
prevents a DMA buffer from being allocated.

This patch adds a search in the list and frees the item. With this
modification the CHAIN_DMA stress tests can loop forever.

Closes: thesofproject#8751
Signed-off-by: Pierre-Louis Bossart <[email protected]>
CONFIG_IPC4_XRUN_NOTIFICATIONS_ENABLE does not exist, replace by valid
Kconfig.

This is just a search and replace patch, it's quite likely this
Kconfig option is bit-rot quality level since it was NEVER compiled
in.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
A code analysis shows a tentative missing ipc_msg_fre() leading to a
memory leak.

This problem was found with a code analysis - the actual code with
XRUN_NOTIFICATIONS_ENABLE was NOT tested.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Jan 22, 2024

Any rough idea which code introduced this leak and when? Our failure rate jumped up recently and mysteriously...

EDIT: is chained DMA tested at all in validation? No idea sorry.

Copy link
Contributor

@ujfalusi ujfalusi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart, I wonder how this was not discovered earlier..

list_item_del(&icd->list);
rfree(icd);
break;
}
comp_free(dev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ipc_comp_free() work instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it based on the input from @kv2019i, but that routine takes a comp_id parameter that I can't really understand, and it does a lot more things.

With this PR I opted for the bare minimum of releasing the memory that was allocated and cleaning up the linked list, but yeah it's not necessarily final. I have no idea if locking is necessary, if the multi-core stuff is required, etc.

That's really for firmware experts to comment on, all I did was root-cause an imbalance and a memory leak. I am more than happy if someone has a better solution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question @ujfalusi . This is using the component infra in a creative way. I took another read through the code and I think this still correct. The chain-dma is not setting up a full pipeline, so the actions in ipc_comp_free() are not really applicable. Maybe @lyakh can take a quick look at this as well?

@plbossart
Copy link
Member Author

@plbossart, I wonder how this was not discovered earlier..

lack of test coverage: I don't think anyone looks at the max_allocated statistic report and checks if it grows between iterations of a test cycle.... It's pretty scary to be honest, we really ought to have an sof-test check on stuff like this.

It's also my legendary bad luck, I break stuff that appears perfectly fine at the surface :-)

More specifically my test involved TWO chain_dmas at the same time, each with a different but large FIFO size and with a intertwined set of init/trigger/release calls that is probably not seen when doing a single playback forever. Even when I tested the playback/capture concurrently with a script, it's likely that the order is preserved and the memory leak has less of an impact.

@kv2019i
Copy link
Collaborator

kv2019i commented Jan 23, 2024

@marc-hb @plbossart et al. Confirmed the leak does occur with HDMI test. One needs to keep the DSP alive and do a lot of HDMI start/stops on the side. Following script is a small tool to trigger:

 #!/bin/bash
aplay -fdat -traw -Dhw:0,5 /dev/zero &
test="aplay -fdat -traw -d1 -Dhw:0,6 /dev/zero"

iteration=0
while :
do
       $test
       ((iteration++))
       echo "Iteration: ${iteration}"
       sleep 1
done

But this will take a long time to end up in a crash (50000+ iterations). A test using heap tracking would be more effective way to catch these in CI. We do have CONFIG_SYS_HEAP_RUNTIME_STATS=y in the perf-overlay and that does catch this, but you do need a specific test that does a lot of iterations.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with a different HDMI based test and got the leak reproduced, and verified this PR fixes the leak, so I think this is good to go (given CI passes).

list_item_del(&icd->list);
rfree(icd);
break;
}
comp_free(dev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question @ujfalusi . This is using the component infra in a creative way. I took another read through the code and I think this still correct. The chain-dma is not setting up a full pipeline, so the actions in ipc_comp_free() are not really applicable. Maybe @lyakh can take a quick look at this as well?

list_item_del(&icd->list);
rfree(icd);
break;
}
comp_free(dev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it should be a one-liner

-	comp_free(dev);
+	ipc_comp_free(dev);

but one would need to check carefully that that function cannot do any harm. @plbossart could you maybe give it a quick test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: the API seems to be asymmetric. There's comp_free() and there's ipc_comp_free() but I don't think anything like comp_alloc(), the closest I find is ipc4_add_comp_dev() and that's what chain DMA is using. So in the short run, I guess - whatever works, in the long run someone might want to clean that up a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple issues:
a) int ipc_comp_free(struct ipc *ipc, uint32_t comp_id)
what is the comp_id?

b) if (!icd->cd->bsource_list.next || !icd->cd->bsink_list.next)
return -EINVAL;
we don't have any such things defined for chain-dma, do we? My guess is that we would hit this error case.

I can give it a try once you guys help me with a)...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets take this as a followup improvement, this is fixing a leak and unblocking tests.

Copy link
Contributor

@jsarha jsarha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes an obvious bug. The added code could of course be written in a new function like ipc4_delete_comp_dev(), but since there is no other obvious user this is probably just fine.

list_for_item_safe(clist, _tmp, &ipc->comp_list) {
icd = container_of(clist, struct ipc_comp_dev, list);
if (icd->cd != dev)
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would make the resulting code more optimal, but this could be

	list_for_item_safe(clist, _tmp, &ipc->comp_list) {
		if (clist != &dev->list)
			continue;
		icd = ...
		...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, but most uses of list_for_item_safe start with the container_of stuff.
I don't mind doing the change if that was the preference, I prefer code that looks familiar and consistent with other usages.

git grep -A3 list_for_item_safe
--
src/audio/module_adapter/module/generic.c:      list_for_item_safe(mem_list, _mem_list, &mod->priv.memory.mem_list) {
src/audio/module_adapter/module/generic.c-              mem = container_of(mem_list, struct module_memory, mem_list);
src/audio/module_adapter/module/generic.c-              if (mem->ptr == ptr) {
src/audio/module_adapter/module/generic.c-                      rfree(mem->ptr);
--
src/audio/module_adapter/module/generic.c:      list_for_item_safe(mem_list, _mem_list, &mod->priv.memory.mem_list) {
src/audio/module_adapter/module/generic.c-              mem = container_of(mem_list, struct module_memory, mem_list);
src/audio/module_adapter/module/generic.c-              rfree(mem->ptr);
src/audio/module_adapter/module/generic.c-              list_item_del(&mem->mem_list);
--
src/audio/module_adapter/module_adapter.c:      list_for_item_safe(dp_queue_list_item, tmp, &mod->dp_queue_dp_to_ll_list) {
src/audio/module_adapter/module_adapter.c-              struct dp_queue *dp_queue =
src/audio/module_adapter/module_adapter.c-                      container_of(dp_queue_list_item, struct dp_queue, list);
src/audio/module_adapter/module_adapter.c-
--
src/audio/module_adapter/module_adapter.c:      list_for_item_safe(dp_queue_list_item, tmp, &mod->dp_queue_ll_to_dp_list) {
src/audio/module_adapter/module_adapter.c-              struct dp_queue *dp_queue =
src/audio/module_adapter/module_adapter.c-                      container_of(dp_queue_list_item, struct dp_queue, list);
src/audio/module_adapter/module_adapter.c-
--
src/audio/module_adapter/module_adapter.c:      list_for_item_safe(blist, _blist, &mod->sink_buffer_list) {
src/audio/module_adapter/module_adapter.c-              struct comp_buffer *buffer = container_of(blist, struct comp_buffer,
src/audio/module_adapter/module_adapter.c-                                                        sink_list);
src/audio/module_adapter/module_adapter.c-              uint32_t flags;
--
src/audio/module_adapter/module_adapter.c:              list_for_item_safe(dp_queue_list_item, tmp, &mod->dp_queue_dp_to_ll_list) {
src/audio/module_adapter/module_adapter.c-                      struct dp_queue *dp_queue =
src/audio/module_adapter/module_adapter.c-                                      container_of(dp_queue_list_item, struct dp_queue, list);
src/audio/module_adapter/module_adapter.c-
--
src/audio/module_adapter/module_adapter.c:              list_for_item_safe(dp_queue_list_item, tmp, &mod->dp_queue_ll_to_dp_list) {
src/audio/module_adapter/module_adapter.c-                      struct dp_queue *dp_queue =
src/audio/module_adapter/module_adapter.c-                                      container_of(dp_queue_list_item, struct dp_queue, list);
src/audio/module_adapter/module_adapter.c-
--
src/audio/module_adapter/module_adapter.c:      list_for_item_safe(blist, _blist, &mod->sink_buffer_list) {
src/audio/module_adapter/module_adapter.c-              struct comp_buffer *buffer = container_of(blist, struct comp_buffer,
src/audio/module_adapter/module_adapter.c-                                                        sink_list);
src/audio/module_adapter/module_adapter.c-              uint32_t flags;
--
src/ipc/ipc-helper.c:   list_for_item_safe(clist, tmp, &icd->cd->bsource_list) {
src/ipc/ipc-helper.c-           struct comp_buffer *buffer = container_of(clist, struct comp_buffer, sink_list);
src/ipc/ipc-helper.c-
src/ipc/ipc-helper.c-           buffer->sink = NULL;
--
src/ipc/ipc-helper.c:   list_for_item_safe(clist, tmp, &icd->cd->bsink_list) {
src/ipc/ipc-helper.c-           struct comp_buffer *buffer = container_of(clist, struct comp_buffer, source_list);
src/ipc/ipc-helper.c-
src/ipc/ipc-helper.c-           buffer->source = NULL;
--
src/ipc/ipc4/helper.c:          list_for_item_safe(list, _list, &icd->cd->bsink_list) {
src/ipc/ipc4/helper.c-                  struct comp_dev *sink;
src/ipc/ipc4/helper.c-
src/ipc/ipc4/helper.c-                  buffer = container_of(list, struct comp_buffer, source_list);
--
src/ipc/ipc4/helper.c:          list_for_item_safe(list, _list, &icd->cd->bsource_list) {
src/ipc/ipc4/helper.c-                  struct comp_dev *source;
src/ipc/ipc4/helper.c-
src/ipc/ipc4/helper.c-                  buffer = container_of(list, struct comp_buffer, sink_list);
--
src/lib/notifier.c:     list_for_item_safe(wlist, tlist, &notify->list[type]) {
src/lib/notifier.c-             handle = container_of(wlist, struct callback_handle, list);
src/lib/notifier.c-             if ((!receiver || handle->receiver == receiver) &&
src/lib/notifier.c-                 (!caller || handle->caller == caller)) {
--
src/lib/notifier.c:     list_for_item_safe(wlist, tlist, &notify->list[type]) {
src/lib/notifier.c-             handle = container_of(wlist, struct callback_handle, list);
src/lib/notifier.c-             if (!caller || !handle->caller || handle->caller == caller)
src/lib/notifier.c-                     handle->cb(handle->receiver, type, data);
--
src/library_manager/lib_notification.c: list_for_item_safe(list_elem, list_tmp, &lib_notif->list) {
src/library_manager/lib_notification.c-         msg_pool_elem = container_of(list_elem, struct ipc_lib_msg, list);
src/library_manager/lib_notification.c-         assert(msg_pool_elem->msg);
src/library_manager/lib_notification.c-
--
src/platform/library/schedule/ll_schedule.c:    list_for_item_safe(tlist, tlist_, &sched_list) {
src/platform/library/schedule/ll_schedule.c-            task = container_of(tlist, struct task, list);
src/platform/library/schedule/ll_schedule.c-
src/platform/library/schedule/ll_schedule.c-            /* only run queued tasks */
--
src/schedule/zephyr_ll.c:       list_for_item_safe(list, tmp, &task_head) {
src/schedule/zephyr_ll.c-               list_item_del(list);
src/schedule/zephyr_ll.c-               list_item_append(list, &sch->tasks);
src/schedule/zephyr_ll.c-       }
--
test/cmocka/src/notifier_mocks.c:       list_for_item_safe(wlist, tlist, &notify->list[type]) {
test/cmocka/src/notifier_mocks.c-               handle = container_of(wlist, struct callback_handle, list);
test/cmocka/src/notifier_mocks.c-               if (!caller || !handle->caller || handle->caller == caller)
test/cmocka/src/notifier_mocks.c-                       handle->cb(handle->receiver, type, data);
--
test/cmocka/src/notifier_mocks.c:       list_for_item_safe(wlist, tlist, &notify->list[type]) {
test/cmocka/src/notifier_mocks.c-               handle = container_of(wlist, struct callback_handle, list);
test/cmocka/src/notifier_mocks.c-               if ((!receiver || handle->receiver == receiver) &&

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add that speed isn't really a problem here, tearing down a chain DMA requires participation from the host and firmware, saving a couple of cycles to scan the component list isn't going to make any difference.

@lgirdwood lgirdwood merged commit 6c4dad6 into thesofproject:main Jan 24, 2024
39 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants