[BUG] delay reporting starts goes out of sync and host position overtakes link position #9418

kv2019i · 2024-08-28T17:13:15Z

Describe the bug
Delay reporting stops working after some time of streaming.

To Reproduce
Play video to a HDA codec on Intel TGL platform with e.g. "mpv".

Reproduction Rate
Not 100% but high

Expected behavior
Delay reporting keeps working

Impact
Can cause lip sync to fail with certain video plays that rely on ALSA delay reporting interfaces

Environment

Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
- Kernel: sof-dev as of 28th Aug 2024
- SOF: SOF main as of 28th Aug 2024
Name of the topology file
- Topology: sof-hda-generic
Name of the platform(s) on which the bug is observed.
- Platform: TGL-H

Screenshots or console output

$ mpv --audio-device=alsa/hw:0,0 foobar.mov
 (+) Video --vid=1 (*) (hevc 1920x1080 23.976fps)
 (+) Audio --aid=1 (*) (pcm_s24le 2ch 48000Hz)
Using hardware decoding (vaapi).
AO: [alsa] 48000Hz stereo 2ch s32
VO: [gpu] 1920x1080 vaapi[p010]
AV: 00:00:38 / 00:02:00 (32%) A-V:  0.000

Audio/Video desynchronisation detected! Possible reasons include too slow
hardware, temporary CPU spikes, broken drivers, and broken files. Audio
position will not match to the video (see A-V status field).

AV: -13343998895:52:45 / 00:02:00 (0%) A-V:  0.000 Dropped: 1394

Notes

Related but not same as issue #8779

The text was updated successfully, but these errors were encountered:

kv2019i · 2024-08-28T17:16:24Z

FYI @ujfalusi with whom I've already debugged this. Suspicion is that start_offset calculation has a bug and/or there is FW buffering that is not accounted. Logs show that delay reporting is correct at first, but in some test runs, the reported decreases in 1ms steps multiple times, and finally the values kernel sees wrap around leading to invalid values reported to application.

kv2019i · 2024-08-28T17:16:47Z

Tested FW fast_mode, did not help.

ujfalusi · 2024-08-29T10:42:09Z

What I have observed is that the LDP (host pointer) and LLP (Link pointer) starts to converge and over time LLP overtakes LDP.
Afaik this is somehow happening in steps, confirmed by your plot graph.
What I have seen is that when the distance between LDP and LLP changes I see this print in firmware log:

[ 2317.902373] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 2350, max 2421, overruns 0
[ 2318.926373] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 2351, max 4103, overruns 0
[ 2319.761355] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192
[ 2319.950373] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 2350, max 3937, overruns 0
[ 2320.974373] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 2347, max 2430, overruns 0

It is always there when the distance changes between the two pointer - which translates to delay values.

Furthermore, I can observe a small audible tick/tock when this happens when playing a 1KHz sine wav file. I cannot hear it with real music, but that is expected.

So far I have been able to reproduce this on: TGL HDA, TGL SSP, LNL SSP.
But not on MTL SDW, MTL SSP.

This is not related to the link type and the DMA servicing it.

ujfalusi · 2024-08-29T10:42:38Z

Tested FW fast_mode, did not help.

The fast mode would not copy if available samples: 0, so it cannot help.

ujfalusi · 2024-08-30T09:03:56Z

I think I have found something interesting in logs. Added print for the host side before every dma_reload() call:

[ 2973.993401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.994401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.995401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.996401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.997398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 2973.998401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.999401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2974.000401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 3005.581403] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.582401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.583401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.584398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 3005.585401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.586403] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.587403] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.588401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.589401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.590401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 3078.152401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3078.153401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3078.154398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 3078.155401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3078.156401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 3122.208401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3122.209401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3122.210398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 3122.211401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3122.212421] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 3155.990401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3155.991401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3155.992398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 3155.993400] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3155.994403] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 4050.473290] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 4050.474291] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 4050.475286] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 4050.476290] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 4050.477290] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

Most of the time the dma_reload() is called with timestamp ending with 400+ but when there is a report of nothing to copy, the timestamp end is less than 400, it looks like that the timer comes just a bit early and the DMA has not yet recorded the transfer, or the transfer yet to complete?

In any case, with this patch:

diff --git a/src/audio/host-zephyr.c b/src/audio/host-zephyr.c
index ac1751059eb5..bc7e52dfb9bb 100644
--- a/src/audio/host-zephyr.c
+++ b/src/audio/host-zephyr.c
@@ -372,8 +372,10 @@ static uint32_t host_get_copy_bytes_normal(struct host_data *hd, struct comp_dev
 	uint32_t free_samples;
 	uint32_t dma_sample_bytes;
 	uint32_t dma_copy_bytes;
+	bool retry = false;
 	int ret;
 
+again:
 	/* get data sizes from DMA */
 	ret = dma_get_status(hd->chan->dma->z_dev, hd->chan->index, &dma_stat);
 	if (ret < 0) {
@@ -403,9 +405,17 @@ static uint32_t host_get_copy_bytes_normal(struct host_data *hd, struct comp_dev
 	if (!(hd->ipc_host.feature_mask & BIT(IPC4_COPIER_FAST_MODE)))
 		dma_copy_bytes = MIN(hd->period_bytes, dma_copy_bytes);
 
-	if (!dma_copy_bytes)
-		comp_info(dev, "no bytes to copy, available samples: %d, free_samples: %d",
-			  avail_samples, free_samples);
+	if (!dma_copy_bytes) {
+		if (!retry && dma_stat.pending_length == 0 && hd->partial_size == 0) {
+			comp_info(dev, "no bytes to copy, retry");
+			k_usleep(5);
+			retry = true;
+			goto again;
+		}
+
+		comp_info(dev, "no bytes to copy, available samples: %d, free_samples: %d, pending_length: %d, partial_size: %d",
+			  avail_samples, free_samples, dma_stat.pending_length, hd->partial_size);
+	}
 
 	/* dma_copy_bytes should be aligned to minimum possible chunk of
 	 * data to be copied by dma.

The issue goes away. I can see occasional prints about the retry, but on second try with a short delay all is good, we have data available....

kv2019i · 2024-08-30T11:26:09Z

Ack @ujfalusi , I think we have a problem with DMA status reporting. If I dump DMA status history of previous LL ticks when the avail_zero happens, this is seen:

D: 0 iter 51968 pending 768 rp 384 wp 1152 reload 384
D: 1 iter 51969 pending 1536 rp 768 wp 768 reload 384
D: 2 iter 51970 pending 1152 rp 1152 wp 768 reload 384
D: 3 iter 51971 pending 0 rp 0 wp 0 reload 384 (now)

... so the BufferEmpty bit is for some reason set even though clearly there is data in the buffer (based on only 384 bytes read out via reload since last 1ms tick). I will try the retry logic and see how it changes the reporting.

The ringbuffer availability check is subject to race with regards to update of BF (Buffer Full) and BNE (Buffer Not Empty) bits in DGCS register, and status of RP (Read Position) and WP (Write Position). Following sequence is observed without this patch when calling dma_get_status() on multiple Intel ADSP platforms: iter 154 pending 1536 RP 768 WP 768, BNE 1, BF 1 -> dma_reload for 384 iter 155 pending 1536 RP 1152 WP 1152, BNE 1, BF 1 -> dma_reload for 384 iter 156 pending 0 RP 0 WP 0, BNE 1, BF 0 Value of pending is not expected to go from 1536 to zero if only 384 bytes have been consumed via dma_reload() since last call to dma_get_status(). Change the logic to read DGCS register later, after the WP and RP have been already read, and only check the BNE bit if Read and Write Positions are equal. Link: thesofproject/sof#9418 Signed-off-by: Kai Vehmanen <[email protected]> Co-developed-by: Peter Ujfalusi <[email protected]> Signed-off-by: Peter Ujfalusi <[email protected]>

marc-hb · 2024-09-02T20:39:35Z

Could this be related to?

host_get_copy_bytes_normal: "no bytes to copy" log spam #8761

The ringbuffer availability check is subject to race with regards to update of BF (Buffer Full) and BNE (Buffer Not Empty) bits in DGCS register, and status of RP (Read Position) and WP (Write Position). Following sequence is observed without this patch when calling dma_get_status() on multiple Intel ADSP platforms: iter 154 pending 1536 RP 768 WP 768, BNE 1, BF 1 -> dma_reload for 384 iter 155 pending 1536 RP 1152 WP 1152, BNE 1, BF 1 -> dma_reload for 384 iter 156 pending 0 RP 0 WP 0, BNE 1, BF 0 Value of pending is not expected to go from 1536 to zero if only 384 bytes have been consumed via dma_reload() since last call to dma_get_status(). Change the logic to read DGCS register later, after the WP and RP have been already read, and only check the BNE bit if Read and Write Positions are equal. Link: thesofproject/sof#9418 Signed-off-by: Kai Vehmanen <[email protected]> Co-developed-by: Peter Ujfalusi <[email protected]> Signed-off-by: Peter Ujfalusi <[email protected]>

The ringbuffer availability check is subject to race with regards to update of BF (Buffer Full) and BNE (Buffer Not Empty) bits in DGCS register, and status of RP (Read Position) and WP (Write Position). Following sequence is observed without this patch when calling dma_get_status() on multiple Intel ADSP platforms: iter 154 pending 1536 RP 768 WP 768, BNE 1, BF 1 -> dma_reload for 384 iter 155 pending 1536 RP 1152 WP 1152, BNE 1, BF 1 -> dma_reload for 384 iter 156 pending 0 RP 0 WP 0, BNE 1, BF 0 Value of pending is not expected to go from 1536 to zero if only 384 bytes have been consumed via dma_reload() since last call to dma_get_status(). Change the logic to read DGCS register later, after the WP and RP have been already read, and only check the BNE bit if Read and Write Positions are equal. (cherry picked from commit 81977f2) Original-Link: thesofproject/sof#9418 Original-Signed-off-by: Kai Vehmanen <[email protected]> Original-Co-developed-by: Peter Ujfalusi <[email protected]> Original-Signed-off-by: Peter Ujfalusi <[email protected]> GitOrigin-RevId: 81977f2 Cr-Build-Id: 8737732657120454561 Cr-Build-Url: https://cr-buildbucket.appspot.com/build/8737732657120454561 Copybot-Job-Name: zephyr-main-copybot-downstream Change-Id: Ib4dbcd2fa7376c0f812c23160d3ceac3de487912 Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/5836426 Reviewed-by: Yuval Peress <[email protected]> Tested-by: ChromeOS Prod (Robot) <[email protected]> Tested-by: Yuval Peress <[email protected]> Commit-Queue: Yuval Peress <[email protected]>

kv2019i · 2024-09-05T11:29:59Z

Fixed via #9437

kv2019i added bug Something isn't working as expected HDA Applies to HD-Audio bus for codec connection TGL Applies to Tiger Lake labels Aug 28, 2024

kv2019i added I2S Applies to I2S bus for codec connection LNL Applies to Lunar Lake platform labels Aug 29, 2024

kv2019i mentioned this issue Aug 30, 2024

drivers: dma: intel_adsp_hda: fix intel_adsp_hda_unused() check zephyrproject-rtos/zephyr#77805

Merged

kv2019i added P2 Critical bugs or normal features P1 Blocker bugs or important features and removed P2 Critical bugs or normal features labels Sep 3, 2024

kv2019i closed this as completed Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] delay reporting starts goes out of sync and host position overtakes link position #9418

[BUG] delay reporting starts goes out of sync and host position overtakes link position #9418

kv2019i commented Aug 28, 2024 •

edited

Loading

kv2019i commented Aug 28, 2024

kv2019i commented Aug 28, 2024

ujfalusi commented Aug 29, 2024

ujfalusi commented Aug 29, 2024

ujfalusi commented Aug 30, 2024

kv2019i commented Aug 30, 2024

marc-hb commented Sep 2, 2024

kv2019i commented Sep 5, 2024

[BUG] delay reporting starts goes out of sync and host position overtakes link position #9418

[BUG] delay reporting starts goes out of sync and host position overtakes link position #9418

Comments

kv2019i commented Aug 28, 2024 • edited Loading

kv2019i commented Aug 28, 2024

kv2019i commented Aug 28, 2024

ujfalusi commented Aug 29, 2024

ujfalusi commented Aug 29, 2024

ujfalusi commented Aug 30, 2024

kv2019i commented Aug 30, 2024

marc-hb commented Sep 2, 2024

kv2019i commented Sep 5, 2024

kv2019i commented Aug 28, 2024 •

edited

Loading