-
Notifications
You must be signed in to change notification settings - Fork 349
Description
Describe the bug
ChromeOS device: Brya/Taniks (ADL-P)
ChromeOS image version: R111-15313.0.0
Under this OS version the backporting v5.10 kernel commit chain is landed.
The waves-integrated SOF build is generated from Google-internal rpl-001-drop-stable branch (checkout head: https://chrome-internal-review.googlesource.com/c/chromeos/third_party/sound-open-firmware-private/+/5325552/1).
While issues being fixed on other devices e.g. Vell, Primus, we found the DSP panic is now observable on Taniks devices. And it seems to be (not strictly-verified) reproducible on Taniks only, and only after kernel backporting commits landing (it can pass by the latest SOF build + OS version 15308.0.0)
To Reproduce
Can be reproduced by Taniks (tplg: sof-adl-max98357a-rt5682-waves-2way). Verified that even if we removed waves (use sof-adl-max98357a-rt5682-2way.tplg instread) the issue is still reproducible.
- Flash OS image R111-15313.0.0
- DSP panic may be observed after device rebooted (not 100%)
- DSP panic can be ~100% reproducible after resume-from-suspend by command
suspend_stress_test -c 1
Observation
To summarize the observation so far:
| proposed fixes | attached conf/tplg | Core SPK | Core DMIC48K | Core DMIC16K/KWD | DSP panic? |
|---|---|---|---|---|---|
| the present tplg | sof-adl-max98357a-rt5682-waves-2way | 1 | 1 | 0 | Has DSP panic |
| tplg w/ removing Waves | sof-adl-max98357a-rt5682-2way | 1 | 1 | 0 | Has DSP panic |
| tplg running on one core | sof-adl-max98357a-rt5682-waves-2way-core0 | 0 | 0 | 0 | NO |
| tplg w/ removing dmic16k/KWD | sof-adl-max98357a-rt5682-waves-2way-nohotword | 1 | 1 | 0 | NO |
(tplg/conf files can be found in the attached zip-file)
Impact
Audio broken on Taniks devices
Log Analysis
The observed DSP panic information is like the following:

Although the error was shown on DMIC0 (DMIC48K), by comparing between sof-loggers from the present tplg (left) and the one-core tplg (right), the error seems to start from the suspicious behavior while re-loading DMIC16K/KWD topology of the present tplg after resume-from-suspend. (logs can be found in the attached zip-file)
The following sof-loggers are extracted from the timing after resume-from-suspend. Arrows in yellow are starting points for loading a new pipeline. Both of the first loading pipe_id is 12 (KWD pipeline). Logs on the right (one-core tplg) show the next actions to create selector (arrow in green) and google-hotword-detect which are located on KWD pipeline, and then load pipe_id 11 (DMIC16K pipeline).
However, logs on the left (present tplg) show the different behavior which jumps to Core#1 for edf_scheduler_init (arrow in read) and then starts to load pipe_id 10 (DMIC48K pipeline). KWD/DMIC16K pipelines seem to be skipped loading since I didn't find in the following logs.

In the end of logs from the present tplg we can find the error message on ipc_comp_connect which leads to DSP issue like the example below (sink_id 59 stands for the selector located on KWD pipeline while source_id 60 is its source buffer).

From my perspective the missing logs for loading KWD/DMIC16K pipelines might be caused from the side effect of multi-core processing. However the ipc_comp_connect error shouldn't be expected which implies the defect by DSP recovering during suspend-resume. However, I have no idea why it is only observed on sof-adl-max98357a-rt5682-2way cases (w/ and w/o Waves). Would it happen to meet the corner case for AMP_SSP=2 or codec in TDM mode?