Skip to content

Conversation

@kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented May 28, 2024

Include the possible domain block time into measurement of the low-latency scheduler thread execution time. The ll timer metrics are typically used for initial system debug and it is very misleading when domain block events are not counted in reported average/max numbers.

Change the timing code to start measurement immediately after domain thread semaphore is taken.

@kv2019i
Copy link
Collaborator Author

kv2019i commented May 28, 2024

Discovered when testing 2c039a1 (of #9174 ) (using shell command to inject scheduling/timing errors into SOF execution.

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message itself says "the low-latency scheduler thread execution time" and I'm not sure the time, waiting for somebody else counts as such "execution" time? AFAICS so far the LL scheduler can only be blocked there waiting for an IPC to bind or to unbind cross-core pipelines. Should that time really be added to the LL execution time?
EDIT: I checked the referenced #9174 only now. I guess it's a matter of definition - whether we want to include that time in the performance metrics. "Execution time" and "waiting time" sound like two different parameters to me.
UPDATE: just to give an example: like when you run time <my_program> - it reports both the execution time (user + sys) and the total (real) time, including any waiting.

@kv2019i
Copy link
Collaborator Author

kv2019i commented May 30, 2024

@lyakh wrote:

The commit message itself says "the low-latency scheduler thread execution
time" and I'm not sure the time, waiting for somebody else counts as such "execution" time? AFAICS so far the LL scheduler can only be blocked there waiting for
an IPC to bind or to unbind cross-core pipelines. Should that time really be added to > the LL execution time? EDIT: I checked the referenced #9174 only now. I guess it's a

That's true and not best choice of word in the git commit. Wording aside, I think it is more useful to include the domain block in the measurement. The affect timing traces are really used to debug system behaviour. The LL thread by design has a fixed time (e.g. 1ms) to run and any event that increases the wallclock time it takes to run a single LL iteration, is something to take note. This can be LL pipeline execution taking unusually long, interrupts taking DSP cycles (including case of interrupts that should be masked that occur due to a programming/configuration mistake), etc etc. Unlike "time foo", we cannot reliably track cycles spent by the LL thread, we can only observe DSP cycles that elapsed and whether the LL thread goes back to semaphore wait before the 1ms deadline.

Given this typical use for the trace, i'd say the domain block should be part of the calculated number, and usuful (low-overhead) debug tool to detect problems in cross-core pipeline logic.

Sounds sensible? I seem to have a typo in git summary, so maybe I should update the commit message. Although given the rare "all green" CI result, pains me to push again :)

@kv2019i kv2019i force-pushed the 202405-ll-sched-block-to-timing-stats branch from c2f558d to 88fe51f Compare May 30, 2024 11:03
@kv2019i kv2019i changed the title schedule: zephyr_domain: include domain block in ll time trackig schedule: zephyr_domain: include domain block in ll time tracking May 30, 2024
@kv2019i
Copy link
Collaborator Author

kv2019i commented May 30, 2024

V2:

  • git commti reworded, not changes to code

@kv2019i kv2019i requested a review from lyakh May 30, 2024 17:44
@kv2019i kv2019i force-pushed the 202405-ll-sched-block-to-timing-stats branch from 88fe51f to 1d84e9a Compare June 3, 2024 09:52
@lgirdwood lgirdwood added this to the v2.11 milestone Jun 18, 2024
Include the possible domain block time when measuring the time it takes
to complete one iteration of the low-latency scheduler thread loop. The
ll timer metrics are typically used for initial system debug and it can
be misleading when domain block events are not counted in reported
average/max numbers.

Change the timing code to start measurement immediately after
domain thread semaphore is taken.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i kv2019i force-pushed the 202405-ll-sched-block-to-timing-stats branch from 1d84e9a to f48574b Compare July 4, 2024 10:27
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 4, 2024

Refresh the commit to rekick CI, no functional change.

@lgirdwood lgirdwood merged commit d01320a into thesofproject:main Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants