Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] delay reporting starts goes out of sync and host position overtakes link position #9418

Closed
kv2019i opened this issue Aug 28, 2024 · 8 comments
Labels
bug Something isn't working as expected HDA Applies to HD-Audio bus for codec connection I2S Applies to I2S bus for codec connection LNL Applies to Lunar Lake platform P1 Blocker bugs or important features TGL Applies to Tiger Lake

Comments

@kv2019i
Copy link
Collaborator

kv2019i commented Aug 28, 2024

Describe the bug
Delay reporting stops working after some time of streaming.

To Reproduce
Play video to a HDA codec on Intel TGL platform with e.g. "mpv".

Reproduction Rate
Not 100% but high

Expected behavior
Delay reporting keeps working

Impact
Can cause lip sync to fail with certain video plays that rely on ALSA delay reporting interfaces

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    • Kernel: sof-dev as of 28th Aug 2024
    • SOF: SOF main as of 28th Aug 2024
  2. Name of the topology file
    • Topology: sof-hda-generic
  3. Name of the platform(s) on which the bug is observed.
    • Platform: TGL-H

Screenshots or console output

$ mpv --audio-device=alsa/hw:0,0 foobar.mov
 (+) Video --vid=1 (*) (hevc 1920x1080 23.976fps)
 (+) Audio --aid=1 (*) (pcm_s24le 2ch 48000Hz)
Using hardware decoding (vaapi).
AO: [alsa] 48000Hz stereo 2ch s32
VO: [gpu] 1920x1080 vaapi[p010]
AV: 00:00:38 / 00:02:00 (32%) A-V:  0.000

Audio/Video desynchronisation detected! Possible reasons include too slow
hardware, temporary CPU spikes, broken drivers, and broken files. Audio
position will not match to the video (see A-V status field).

AV: -13343998895:52:45 / 00:02:00 (0%) A-V:  0.000 Dropped: 1394

Notes

Related but not same as issue #8779

@kv2019i kv2019i added bug Something isn't working as expected HDA Applies to HD-Audio bus for codec connection TGL Applies to Tiger Lake labels Aug 28, 2024
@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 28, 2024

FYI @ujfalusi with whom I've already debugged this. Suspicion is that start_offset calculation has a bug and/or there is FW buffering that is not accounted. Logs show that delay reporting is correct at first, but in some test runs, the reported decreases in 1ms steps multiple times, and finally the values kernel sees wrap around leading to invalid values reported to application.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 28, 2024

Tested FW fast_mode, did not help.

@ujfalusi
Copy link
Contributor

What I have observed is that the LDP (host pointer) and LLP (Link pointer) starts to converge and over time LLP overtakes LDP.
Afaik this is somehow happening in steps, confirmed by your plot graph.
What I have seen is that when the distance between LDP and LLP changes I see this print in firmware log:

[ 2317.902373] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 2350, max 2421, overruns 0
[ 2318.926373] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 2351, max 4103, overruns 0
[ 2319.761355] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192
[ 2319.950373] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 2350, max 3937, overruns 0
[ 2320.974373] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 2347, max 2430, overruns 0

It is always there when the distance changes between the two pointer - which translates to delay values.

Furthermore, I can observe a small audible tick/tock when this happens when playing a 1KHz sine wav file. I cannot hear it with real music, but that is expected.

So far I have been able to reproduce this on: TGL HDA, TGL SSP, LNL SSP.
But not on MTL SDW, MTL SSP.

This is not related to the link type and the DMA servicing it.

@ujfalusi
Copy link
Contributor

Tested FW fast_mode, did not help.

The fast mode would not copy if available samples: 0, so it cannot help.

@kv2019i kv2019i added I2S Applies to I2S bus for codec connection LNL Applies to Lunar Lake platform labels Aug 29, 2024
@ujfalusi
Copy link
Contributor

I think I have found something interesting in logs. Added print for the host side before every dma_reload() call:

[ 2973.993401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.994401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.995401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.996401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.997398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 2973.998401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2973.999401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 2974.000401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 3005.581403] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.582401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.583401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.584398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 3005.585401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.586403] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.587403] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.588401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.589401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3005.590401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 3078.152401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3078.153401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3078.154398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 3078.155401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3078.156401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 3122.208401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3122.209401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3122.210398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 3122.211401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3122.212421] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 3155.990401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3155.991401] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3155.992398] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 3155.993400] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 3155.994403] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

[ 4050.473290] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 4050.474291] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 4050.475286] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x4 no bytes to copy, available samples: 0, free_samples: 192, pending_length: 0, partial_size: 0
[ 4050.476290] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192
[ 4050.477290] <inf> host_comp: host_copy_normal: comp:0 0x4 host DMA reload 2: size: 192

Most of the time the dma_reload() is called with timestamp ending with 400+ but when there is a report of nothing to copy, the timestamp end is less than 400, it looks like that the timer comes just a bit early and the DMA has not yet recorded the transfer, or the transfer yet to complete?

In any case, with this patch:

diff --git a/src/audio/host-zephyr.c b/src/audio/host-zephyr.c
index ac1751059eb5..bc7e52dfb9bb 100644
--- a/src/audio/host-zephyr.c
+++ b/src/audio/host-zephyr.c
@@ -372,8 +372,10 @@ static uint32_t host_get_copy_bytes_normal(struct host_data *hd, struct comp_dev
 	uint32_t free_samples;
 	uint32_t dma_sample_bytes;
 	uint32_t dma_copy_bytes;
+	bool retry = false;
 	int ret;
 
+again:
 	/* get data sizes from DMA */
 	ret = dma_get_status(hd->chan->dma->z_dev, hd->chan->index, &dma_stat);
 	if (ret < 0) {
@@ -403,9 +405,17 @@ static uint32_t host_get_copy_bytes_normal(struct host_data *hd, struct comp_dev
 	if (!(hd->ipc_host.feature_mask & BIT(IPC4_COPIER_FAST_MODE)))
 		dma_copy_bytes = MIN(hd->period_bytes, dma_copy_bytes);
 
-	if (!dma_copy_bytes)
-		comp_info(dev, "no bytes to copy, available samples: %d, free_samples: %d",
-			  avail_samples, free_samples);
+	if (!dma_copy_bytes) {
+		if (!retry && dma_stat.pending_length == 0 && hd->partial_size == 0) {
+			comp_info(dev, "no bytes to copy, retry");
+			k_usleep(5);
+			retry = true;
+			goto again;
+		}
+
+		comp_info(dev, "no bytes to copy, available samples: %d, free_samples: %d, pending_length: %d, partial_size: %d",
+			  avail_samples, free_samples, dma_stat.pending_length, hd->partial_size);
+	}
 
 	/* dma_copy_bytes should be aligned to minimum possible chunk of
 	 * data to be copied by dma.

The issue goes away. I can see occasional prints about the retry, but on second try with a short delay all is good, we have data available....

@kv2019i
Copy link
Collaborator Author

kv2019i commented Aug 30, 2024

Ack @ujfalusi , I think we have a problem with DMA status reporting. If I dump DMA status history of previous LL ticks when the avail_zero happens, this is seen:

D: 0 iter 51968 pending 768 rp 384 wp 1152 reload 384
D: 1 iter 51969 pending 1536 rp 768 wp 768 reload 384
D: 2 iter 51970 pending 1152 rp 1152 wp 768 reload 384
D: 3 iter 51971 pending 0 rp 0 wp 0 reload 384 (now)

... so the BufferEmpty bit is for some reason set even though clearly there is data in the buffer (based on only 384 bytes read out via reload since last 1ms tick). I will try the retry logic and see how it changes the reporting.

kv2019i added a commit to kv2019i/zephyr that referenced this issue Sep 2, 2024
The ringbuffer availability check is subject to race with regards to
update of BF (Buffer Full) and BNE (Buffer Not Empty) bits in DGCS
register, and status of RP (Read Position) and WP (Write Position).

Following sequence is observed without this patch when
calling dma_get_status() on multiple Intel ADSP platforms:

iter 154 pending 1536 RP 768 WP 768, BNE 1, BF 1
-> dma_reload for 384
iter 155 pending 1536 RP 1152 WP 1152, BNE 1, BF 1
-> dma_reload for 384
iter 156 pending 0 RP 0 WP 0, BNE 1, BF 0

Value of pending is not expected to go from 1536 to zero if only 384
bytes have been consumed via dma_reload() since last call to
dma_get_status().

Change the logic to read DGCS register later, after the WP and RP have
been already read, and only check the BNE bit if Read and Write
Positions are equal.

Link: thesofproject/sof#9418
Signed-off-by: Kai Vehmanen <[email protected]>
Co-developed-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Sep 2, 2024

Could this be related to?

@kv2019i kv2019i added P2 Critical bugs or normal features P1 Blocker bugs or important features and removed P2 Critical bugs or normal features labels Sep 3, 2024
carlescufi pushed a commit to zephyrproject-rtos/zephyr that referenced this issue Sep 4, 2024
The ringbuffer availability check is subject to race with regards to
update of BF (Buffer Full) and BNE (Buffer Not Empty) bits in DGCS
register, and status of RP (Read Position) and WP (Write Position).

Following sequence is observed without this patch when
calling dma_get_status() on multiple Intel ADSP platforms:

iter 154 pending 1536 RP 768 WP 768, BNE 1, BF 1
-> dma_reload for 384
iter 155 pending 1536 RP 1152 WP 1152, BNE 1, BF 1
-> dma_reload for 384
iter 156 pending 0 RP 0 WP 0, BNE 1, BF 0

Value of pending is not expected to go from 1536 to zero if only 384
bytes have been consumed via dma_reload() since last call to
dma_get_status().

Change the logic to read DGCS register later, after the WP and RP have
been already read, and only check the BNE bit if Read and Write
Positions are equal.

Link: thesofproject/sof#9418
Signed-off-by: Kai Vehmanen <[email protected]>
Co-developed-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
coreboot-bot pushed a commit to coreboot/zephyr-cros that referenced this issue Sep 5, 2024
The ringbuffer availability check is subject to race with regards to
update of BF (Buffer Full) and BNE (Buffer Not Empty) bits in DGCS
register, and status of RP (Read Position) and WP (Write Position).

Following sequence is observed without this patch when
calling dma_get_status() on multiple Intel ADSP platforms:

iter 154 pending 1536 RP 768 WP 768, BNE 1, BF 1
-> dma_reload for 384
iter 155 pending 1536 RP 1152 WP 1152, BNE 1, BF 1
-> dma_reload for 384
iter 156 pending 0 RP 0 WP 0, BNE 1, BF 0

Value of pending is not expected to go from 1536 to zero if only 384
bytes have been consumed via dma_reload() since last call to
dma_get_status().

Change the logic to read DGCS register later, after the WP and RP have
been already read, and only check the BNE bit if Read and Write
Positions are equal.

(cherry picked from commit 81977f2)

Original-Link: thesofproject/sof#9418
Original-Signed-off-by: Kai Vehmanen <[email protected]>
Original-Co-developed-by: Peter Ujfalusi <[email protected]>
Original-Signed-off-by: Peter Ujfalusi <[email protected]>
GitOrigin-RevId: 81977f2
Cr-Build-Id: 8737732657120454561
Cr-Build-Url: https://cr-buildbucket.appspot.com/build/8737732657120454561
Copybot-Job-Name: zephyr-main-copybot-downstream
Change-Id: Ib4dbcd2fa7376c0f812c23160d3ceac3de487912
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/5836426
Reviewed-by: Yuval Peress <[email protected]>
Tested-by: ChromeOS Prod (Robot) <[email protected]>
Tested-by: Yuval Peress <[email protected]>
Commit-Queue: Yuval Peress <[email protected]>
@kv2019i
Copy link
Collaborator Author

kv2019i commented Sep 5, 2024

Fixed via #9437

@kv2019i kv2019i closed this as completed Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected HDA Applies to HD-Audio bus for codec connection I2S Applies to I2S bus for codec connection LNL Applies to Lunar Lake platform P1 Blocker bugs or important features TGL Applies to Tiger Lake
Projects
None yet
Development

No branches or pull requests

3 participants