[BUG] DP scheduler introducing 10ms+ latency to AEC capture streams #8640

andyross · 2023-12-16T18:06:19Z

The DP scheduler is introducing unacceptable latency to the stream. When used with Google AEC (see PR #8571), the capture pipeline is delaying delivery of output data by at least 10ms.

The architecture of the DP component is that it accumulates and runs analysis on blocks 10ms at a time, but it's being fed from a capture pipeline running at 1ms. There is a warning condition in the new code[1] where it detects the case where it needs to run the AEC analysis twice in one process() call (which it can't do, because it doesn't have output buffer space), because of too much data waiting for it in the input buffers (both its own and the upstream sources). See here:

sof/src/audio/google/google_rtc_audio_processing.c

Line 816 in 720c38a

comp_dbg(mod->dev, "AEC sink backed up!");

This is a "comp_dbg()" currently because if I enable it process() ends up spamming the log at 100 Hz with the warning: at steady state there are always 20ms+ worth of data to analyze in process.

Discussion in #8621 implies that this is the DP queueing code trying to rate limit the output, which I'm having a hard time understanding. The component (simply by virtue of running slower) provides lots of output buffering already.

Basically: we can't be doing this. The requirement has to be that as soon as there is output data available from the component, it gets flushed downstream to the host ALSA stream as fast as possible. Right now AEC is seeing much worse latency with DP than it does under a synchronous 10ms pipeline.

[1] You can also see it in the unadulterated mtl-007-drop-stable branch by e.g. logging the fill status of the input buffers in process().

plbossart · 2023-12-18T09:07:04Z

Can we have an ascii-art view of the pipeline and what the 'much worse" latency result is? Using the AEC in a LL setting would require a change of the tick to 10ms, so not sure if the comparison is really with 1ms tick in all cases?

I wonder if the issue is not really related to the DP component but rather that it's followed by an LL component that will work one chunk at a time. There's really no way for an LL component to do more that process one chunk with the least amount of MCPS, that's what they are designed for.

But there might be a way out of this, IIRC there is an option to make host copiers go faster and dump all the information. @mmaka1 can you chime in?

mmaka1 · 2023-12-18T11:25:45Z

I am not sure how "The DP scheduler is introducing unacceptable latency to the stream". As I understand, we are talking about the latency of the path from the AEC output to the host. And imo it is all about the configuration of any additional buffering inside that part, not about the DP scheduler. If there is no extra buffering, the first samples (1ms, 2ms, 10ms - depending on how the LL priod is configured) are delivered to the HD/A Host Input DMA buffer in the next LL cycle and there is no extra latency. If there is any extra buffering threshold, let's say N ms that drives the transfers from the DSP through the SW stack up the application, then yes, the LL period could be configured to deliver N ms of output in one tick, otherwise there would be extra buffering time to collect that N ms required to start the transfer. An alternative to increasing the LL period, as @plbossart mentioned, is to connect the Copier component directly between the output of the AEC and the Host Input DMA and configure it in the so called "fast mode". Then the entire output would be transferred to the DMA buffer in a one LL tick right after the AEC finishes the processing.

marcinszkudlinski · 2023-12-18T12:17:35Z

it needs to run the AEC analysis twice in one process() call (which it can't do, because it doesn't have output buffer space),

You can also see it in the unadulterated mtl-007-drop-stable branch by e.g. logging the fill status of the input buffers in process().

@andyross I need more details to support you with this

mtl 007 branch seems to be working fine using AEC_MOCK code. It also contains couple of workarounds, like num channel you pointed here #8639. To simplify debugging, can we focus on bringing AEC up using 007 as a base? Once AEC is stable there it would be easier to go forward with your changes.

andyross · 2023-12-18T13:56:41Z

Unfortunately I'm travelling this week without the device, so I won't be able to test until Friday. I was pretty sure I saw exactly the same behavior in the drop branch too. If you're sure it's working there, look to the fact that the new component code will consume bytes as they arrive in process (this is a feature, it front-loads the conversion overhead) and not all at once from the buffer.

Is there a rule that a 10ms DP component isn't allowed to take 1ms input, maybe?

marcinszkudlinski · 2023-12-20T15:06:43Z

@andyross
https://github.com/thesofproject/sof/blob/720c38a742799cf8926046c6d3f51e13b20676fa/src/audio/google/google_rtc_audio_processing.c#L796C1-L799C5

/* If fref > fmic (common at pipeline startup if
 * playback was already active), we should consume the early
 * samples so AEC compares the most recent values.
 */

That's totally not true

Keep in mind that DP scheduler works async to LL, in a thread

It is possible that something happens between is_ready and process - is_ready says "go", before DP got CPU time another LL processing add some data to any of inputs - but those belong to next cycle, and should be processed later.
Timeline like:

is_ready called: IN1 - 10ms (from 0 to 9), IN2 - 10ms(from 0 to 9)
...some time passes... LL is still processing
process called: IN1 - 12ms (from 0 to 11), IN2 - 11ms(from 0 to 10)

AEC should process 10ms from each input (0..9) and that's it, leaving newer data to next cycle

marcinszkudlinski · 2023-12-21T15:47:05Z

@andyross

after topology changes and FW fixes (all on the 007 branch) I was able to remove all WAs from AEC code. All changes are here:

https://github.com/thesofproject/sof/commits/mtl-007-drop-stable/
Please consider using this code as a base

Number of channels are now taken from the topology, all buffer sizes are set properly.
It is working perfectly for me - at least with AEC MOCK

I strongly advise to use code from 007 branch to enable AEC. It would be way easier for me to support you and less likely to get into any problems that may have been already solved on 007.

Once we do have a stable base for test and reference, you would be able to safely proceed with your changes (720c38a)

andyross · 2023-12-22T13:23:33Z

That's totally not true

Indeed. There's a comment from me to this effect in the PR too (#8571 (comment)). Nonetheless this is actually needed on synchronous pipelines because (at least on mt8195) the startup of the capture pipeline lags the prepare() call by 20-30ms, we end up with too much buffered and have a latency bug. The solution needs to be to flush the reference data on capture startup only.

andyross · 2023-12-22T13:33:35Z

I strongly advise to use code from 007 branch to enable AEC

That's a non-starter for enabling AEC on other platforms though, which I'm trying to do simultaneously. The whole reason we're in this mess is that AEC has traditionally been enabled only on device branches like this, where it bitrots. It's never worked on main, and I'm trying to fix that. I promise I'll continue to use this as a validation reference, but at the end of the day the only thing that can merge on main is a unified component that works everywhere.

andyross · 2023-12-22T15:42:01Z

FWIW, I can confirm that the most recent code in mtl-007-drop-stable does not show the backlog I was measuring earlier. When capture is running, the remaining bytes in the mic input buffer are never more than 40 samples (i.e. <1ms, which is the maximum precision available in the LL scheduler on the other core anyway) more than needed to complete one 10ms iteration.

I'll leave this bug open for comments until I can find the relevant fixes and get them merged into the main branch PR, but this looks good to me.

andyross · 2023-12-23T00:32:03Z

So, I reverse engineered the recent changes in mtl-007-drop-stable to isolate what it was that fixed the buffer backups. See notes in #8639. I can reproduce clean streaming in both my tree and the drop.

Essentially this is a duplicate bug, the root cause was pipeline state, not DP scheduling. I'm happy to close this, but will leave it open for comment until folks start to return to work after the holidays.

marc-hb · 2024-01-12T01:03:00Z

Close?

andyross · 2024-01-12T16:19:01Z

Good point, thanks for the ping

andyross added the bug Something isn't working as expected label Dec 16, 2023

andyross mentioned this issue Dec 18, 2023

dai_zephyr: Silence benign warnings #8621

Closed

mengdonglin added the MTL Applies to Meteor Lake platform label Dec 21, 2023

andyross mentioned this issue Dec 23, 2023

[BUG] IPC4 fails to set up stream metadata correctly #8639

Open

andyross mentioned this issue Dec 23, 2023

Google AEC rework: unbreak/resurrect, mt8195, DP integration #8571

Closed

andyross closed this as completed Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DP scheduler introducing 10ms+ latency to AEC capture streams #8640

[BUG] DP scheduler introducing 10ms+ latency to AEC capture streams #8640

andyross commented Dec 16, 2023

plbossart commented Dec 18, 2023

mmaka1 commented Dec 18, 2023

marcinszkudlinski commented Dec 18, 2023

andyross commented Dec 18, 2023

marcinszkudlinski commented Dec 20, 2023

marcinszkudlinski commented Dec 21, 2023

andyross commented Dec 22, 2023

andyross commented Dec 22, 2023

andyross commented Dec 22, 2023

andyross commented Dec 23, 2023

marc-hb commented Jan 12, 2024

andyross commented Jan 12, 2024

[BUG] DP scheduler introducing 10ms+ latency to AEC capture streams #8640

[BUG] DP scheduler introducing 10ms+ latency to AEC capture streams #8640

Comments

andyross commented Dec 16, 2023

plbossart commented Dec 18, 2023

mmaka1 commented Dec 18, 2023

marcinszkudlinski commented Dec 18, 2023

andyross commented Dec 18, 2023

marcinszkudlinski commented Dec 20, 2023

marcinszkudlinski commented Dec 21, 2023

andyross commented Dec 22, 2023

andyross commented Dec 22, 2023

andyross commented Dec 22, 2023

andyross commented Dec 23, 2023

marc-hb commented Jan 12, 2024

andyross commented Jan 12, 2024