Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debugability: logger: add note on DMA trace limitations #381

Merged

Conversation

kv2019i
Copy link
Contributor

@kv2019i kv2019i commented Nov 10, 2021

If sof-logger is started (or restarted) while DSP is running,
the initial traces may come in incorrect order or are incomplete.
This is important to note when parsing the logger results.

BugLink: thesofproject/sof-test#297
Signed-off-by: Kai Vehmanen [email protected]

Copy link
Contributor

@ujfalusi ujfalusi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One typo in the text.

I'm not sure if it can be put into words, but over time the trace messages (as they flow in) will start to straightened up. Iow, the first batch of trace information might not be of an relevance of the past. Or something.

@@ -136,6 +136,11 @@ Examples
distribution kernels ``sof_debug=1`` module option for ``snd_sof`` might be
needed if the /sys/kernel/debug/sof/trace file is not present)

.. note::
The DMA trace sysfs interface is a streaming interface. If sof-logger is
started (or restarted) while DSP is active, it is possible the initial traces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/traces/trace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following work?

The DMA trace sysfs interface is a streaming interface. If sof-logger is started (or restarted) while DSP is active, the initial trace messages might appear in the wrong order or incomplete.

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sof-logger is started (or restarted) while DSP is active, ...

That does not really make sense because the logger can never run unless the driver and firmware have been loaded. When you try:

sudo sof-logger  -t -f 3 -l  /lib/firmware/intel/sof/sof-apl.ldc 
error: Unable to open version file /sys/kernel/debug/sof/fw_version: No such file or directory

The description also does not match the following behavior: after disabling D3 with echo on > sys/devices/pci0000:00/"${pci_id}"/power/control , the sof-logger can be stopped and restarted any number of times and it displays the exact same DMA traces every single time. That could not possibly happen if it were really implemented as a streaming interface.

Even worse: multiple instances of the sof-logger show the same DMA traces even when run concurrently.

So this description does not seem to match reality.

@ujfalusi
Copy link
Contributor

If sof-logger is started (or restarted) while DSP is active, ...

That does not really make sense because the logger can never run unless the driver and firmware have been loaded. When you try:

sudo sof-logger  -t -f 3 -l  /lib/firmware/intel/sof/sof-apl.ldc 
error: Unable to open version file /sys/kernel/debug/sof/fw_version: No such file or directory

@marc-hb, do we really need to document that if you want to collect dtrace from SOFirmware you need to have the drivers loaded?
If the driver probe fails then the drivers are not loaded and the debugfs is not populated, so there will be no dtrace, fw_version or even sof directory.
It is another question if this is the right thing to do... What I mean is that it might be helpful to have at least access to the mtrace logs to be able analyze the failure? Hrm, we might want to have a flag to sof_debug to instruct the driver to dump the mtrace in case of boot fail?

The description also does not match the following behavior: after disabling D3 with echo on > sys/devices/pci0000:00/"${pci_id}"/power/control , the sof-logger can be stopped and restarted any number of times and it displays the exact same DMA traces every single time. That could not possibly happen if it were really implemented as a streaming interface.

Even worse: multiple instances of the sof-logger show the same DMA traces even when run concurrently.

So this description does not seem to match reality.

Yes, I think that is the behavior atm or something similar.
If I look at the dtrace code and logic on the firmware side and on the kernel side I think (again, imho) we have some misaligned assumptions and bugs on both sides which allows it to work but in a non defined way (well,if you read the code you can see the 'order' in the chaos).
The kernel driver uses the position within the debugfs file stream as read offset on the dtrace buffer along with a position information coming from the firmware. They are unrelated. When you start sof-logger the position in the file is 0, then you follow the trace and it is fine, but if you stop it and restart it: the position in the trace file is going to start from 0 and it does not matter if we killed the sof-logger when it was in mid buffer (offset derived from the position).
If you start/stop/start/stop/start/stop sof-logger it will always going to read from 0 offset of the buffer.

This is only one - and obvious - bug, but if you look closer there are other cases where things get out of control.

I think we should fix the dtrace for good and then update the documentation or even better have a description what we want.
I know that if I fix the dtrace then there will be complaints that it did things differently in the past (no matter if it was wrong)...

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 12, 2021

@marc-hb, do we really need to document that if you want to collect dtrace from SOFirmware you need to have the drivers loaded?

No, I was merely describing one of the mismatches between this PR and reality. That's all.

I know that if I fix the dtrace then there will be complaints that it did things differently in the past (no matter if it was wrong)...

I doubt that. I think the sof-logger expectations are pretty low so I am optimistic that the only "complaints" will be from sof-test - I mean test failures. I will align sof-test to all "correct" and well defined behaviors that match (future) reality.

So if you know what to fix then please go for it.

@ujfalusi
Copy link
Contributor

ujfalusi commented Nov 12, 2021

I know that if I fix the dtrace then there will be complaints that it did things differently in the past (no matter if it was wrong)...

I doubt that. I think the sof-logger expectations are pretty low so I am optimistic that the only "complaints" will be from sof-test - I mean test failures. I will align sof-test to all "correct" and well defined behaviors that match (future) reality.

Right. One thing which bugs me is around the sof-logger start/stop/start/stop.
We either do it in a simple manner: sof-logger will only show the trace from the point it started.
or
We try to do a more sophisticated (and might be better) approach by fetching logs not have been taken by a previous run of sof-logger. We need to deal with ring buffer wrapping and stuff to get chronological backlog and it would be nice to inject (in kernel level) some marks as well.

So if you know what to fix then please go for it.

I kind of know what to fix, I'm debating the how ;)

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 12, 2021

We try to do a more sophisticated (and might be better) approach by fetching logs not have been taken by a previous run of sof-logger

Not losing logs: that's the least I would expect from a "streaming" interface like what this PR describes (ignoring ringbuffer overruns of course).

@ujfalusi
Copy link
Contributor

We try to do a more sophisticated (and might be better) approach by fetching logs not have been taken by a previous run of sof-logger

Not losing logs: that's the least I would expect from a "streaming" interface like what this PR describes (ignoring ringbuffer overruns of course).

It depends if it supports time shifting ;) "streaming" is real time, not on-demand, but I also prefer to have a bit of context from the past (from the point the previous sof-logger was stopped - if it run).

@kv2019i
Copy link
Contributor Author

kv2019i commented Nov 12, 2021

@marc-hb I'll remove "streaming", you are reading too much into it.

@kv2019i kv2019i force-pushed the topic/sof-logger-restart-note branch from c3f2078 to ff4ff8d Compare November 12, 2021 09:34
@kv2019i
Copy link
Contributor Author

kv2019i commented Nov 12, 2021

Updated. Thanks @anton-intel , @ujfalusi and @marc-hb for the comments.

@kv2019i
Copy link
Contributor Author

kv2019i commented Nov 12, 2021

@marc-hb wrote:

The description also does not match the following behavior: after disabling D3 with echo on
sys/devices/pci0000:00/"${pci_id}"/power/control , the sof-logger can be stopped and restarted any number of times and it
displays the exact same DMA traces every single time. That could not possibly happen if it were really implemented as a
streaming interface.

I removed "streaming" now. Now behavior matches description. ""If sof-logger is started (or restarted) while DSP is active, the initial trace messages might appear in the wrong order or be incomplete.""

Don't shoot the messenger here, that's just how it works. The fact that you get traces most of the time upon restart, just makes this all the more dangerous (and worthy of note) as IT IS NOT GUARANTEED!

@kv2019i
Copy link
Contributor Author

kv2019i commented Nov 12, 2021

(at)lgirdwood Let me remeasure the DSP load with the mixer+eq optimizations now merged. I'll then make a proposal wether the mixer should be dropped or not. I'll keep this as draft until that point. (UPDATE: wrong PR)

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 12, 2021

Don't shoot the messenger here, that's just how it works. The fact that you get traces most of the time upon restart, just makes this all the more dangerous (and worthy of note) as IT IS NOT GUARANTEED!

Sorry for shooting the messenger but when a NOT GUARANTEED behavior like this one works 99.9% of the time then something is very deeply flawed and the documentation should not look like everything is fine.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 12, 2021

but I also prefer to have a bit of context from the past (from the point the previous sof-logger was stopped - if it run).

Right, otherwise you can't debug... boot!

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 13, 2021

As stated in various other github places, I think this is too short. It should give at least clues why using keywords like "ringbuffer", "offset" and "DSP suspend".

Maybe also worth noting that we do this in sof-test for every test.

If sof-logger is started (or restarted) while DSP is running, the
initial traces may be incomplete. Document the limitation and give a
brief explanation of the current ringbuffer design and how it affects
the start-up behaviour.

BugLink: thesofproject/sof-test#297
BugLink: thesofproject/linux#3275
Signed-off-by: Kai Vehmanen <[email protected]>
@kv2019i kv2019i force-pushed the topic/sof-logger-restart-note branch from ff4ff8d to 9cf4ded Compare November 15, 2021 09:45
@kv2019i kv2019i requested a review from anton-intel November 15, 2021 09:46
@kv2019i
Copy link
Contributor Author

kv2019i commented Nov 15, 2021

@marc-hb wrote:

Sorry for shooting the messenger but when a NOT GUARANTEED behavior like this one works 99.9% of the time then something is very deeply flawed and the documentation should not look like everything is fine.

I don't really agree with the general statement. Many systems have timing uncertainty and a compromise needs to be made between 100%, 99.9% and 90% odds (with required work effort to reach a level of certainly varying accordingly).

I expanded the explanation of current behaviour and repushed the PR.

I also filed an enhancement ticket to the kernel interface: thesofproject/linux#3275
If given the current design and its limitations, we cannot reliably implement automated validation of FW traces, then the design is not fit for purpose and we need to change it.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kv2019i, this is useful information whilst we all align on a fix.

@lgirdwood lgirdwood merged commit 17ca3fe into thesofproject:master Nov 15, 2021
@marc-hb
Copy link
Collaborator

marc-hb commented Nov 16, 2021

thanks @kv2019i for the extended version, I find it extremely useful.

@marc-hb marc-hb requested a review from RanderWang November 16, 2021 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants