component/module refactoring pass #9211

andyross · 2024-06-07T16:40:09Z

Some API refactoring, no behavioral changes.

The comp_buffer change is required to expose the pipeline ID to source/sinker users (see Google AEC rework: dynamic formats, legacy platform support #8919).
The processing_module backpointer fix is just a dangerous pattern I stumbled on during that work. Typesafety is good. Overloading untyped "private" data when you aren't the defined owner is bad. This would have certainly blown up on someone at some point even if it hasn't yet.

src/include/module/audio/audio_stream.h

missed one detail. Not a blocker - just nice to have

lgirdwood

@marcinszkudlinski good to merge ? it looks the the get by ID call can be added incrementally ?

lgirdwood · 2024-06-11T13:15:51Z

@lrudyX @wszypelt good to merge or real bug reported by internal CI ?

marcinszkudlinski · 2024-06-11T13:16:39Z

@lgirdwood it can be added later, also in #8919, so we can merge

lgirdwood · 2024-06-11T13:19:02Z

@andyross also a build failure reported by CI (testbench build) but maybe in others too.

[ 34%] Building C object CMakeFiles/sof.dir/src/audio/crossover/crossover_ipc3.c.o
/home/runner/work/sof/sof/src/audio/crossover/crossover_ipc3.c: In function ‘crossover_check_sink_assign’:
/home/runner/work/sof/sof/src/audio/crossover/crossover_ipc3.c:49:35: error: ‘struct comp_buffer’ has no member named ‘pipeline_id’
   49 |                 pipeline_id = sink->pipeline_id;
      |                                   ^~
gmake[5]: *** [CMakeFiles/sof.dir/build.make:762: CMakeFiles/sof.dir/src/audio/crossover/crossover_ipc3.c.o] Error 1

wszypelt · 2024-06-11T13:53:56Z

@lgirdwood
step: _Internal Intel CI System/merge/build — Tests failed
there are issues in kd and kpb

lgirdwood · 2024-06-11T13:55:57Z

@lgirdwood step: _Internal Intel CI System/merge/build — Tests failed there are issues in kd and kpb

Thanks @wszypelt can you share here with @andyross Thanks !

andyross · 2024-06-11T14:05:37Z

The build failure is real, just something that doesn't build in any of the targets I poked (FWIW: is there a simple way to "run CI" on a local build host?). Need to swap hardware this morning (juggling AEC and a big array of MTK boards for Zephyr work and the desk is running out of space) and will have a fix up, along with the API additions requested.

cujomalainey · 2024-06-11T17:04:32Z

(FWIW: is there a simple way to "run CI" on a local build host?)

https://github.com/nektos/act

Although i wouldnt use it on a corp machine

andyross · 2024-06-11T18:17:59Z

https://github.com/nektos/act

Oh sure, but that requires knowing the right github workflow to run. And I can read .github/workflows anyway. Both of those are ~= in complexity to just reading the build failure in CI anyway. What I meant was something in the SOF source tree like "scripts/run-ci.sh" or whatever that developers can use to easily check a PR before submission.

Like, in Zephyr, almost[1] everything[2] you want to test is exercised by just running "scripts/twister" with no arguments. Would be nice to have a similar smoke test in SOF, though the needed tooling setup would be a lot of complexity.

[1] Well, except for stuff that requires tooling not distributed in the SDK, so it doesn't do the doc build or renode runs or clang builds. There's an increasingly long list of "Oh, yeah, that" cases in CI failures, but really that's a bug in the SDK strategy and not the tooling.

[2] With the caveat that a bare twister run now takes upwards of an hour and a half on a 64 core build box. That too is a real bug, but not with the tooling. Enumerating the cartesian product of "Tests X Platforms" is running into severe scaling problems right now.

cujomalainey · 2024-06-11T18:41:16Z

https://github.com/nektos/act

Oh sure, but that requires knowing the right github workflow to run. And I can read .github/workflows anyway. Both of those are ~= in complexity to just reading the build failure in CI anyway. What I meant was something in the SOF source tree like "scripts/run-ci.sh" or whatever that developers can use to easily check a PR before submission.

Like, in Zephyr, almost[1] everything[2] you want to test is exercised by just running "scripts/twister" with no arguments. Would be nice to have a similar smoke test in SOF, though the needed tooling setup would be a lot of complexity.

[1] Well, except for stuff that requires tooling not distributed in the SDK, so it doesn't do the doc build or renode runs or clang builds. There's an increasingly long list of "Oh, yeah, that" cases in CI failures, but really that's a bug in the SDK strategy and not the tooling.

[2] With the caveat that a bare twister run now takes upwards of an hour and a half on a 64 core build box. That too is a real bug, but not with the tooling. Enumerating the cartesian product of "Tests X Platforms" is running into severe scaling problems right now.

@marc-hb is there a script for something like this? I know a lot of the CI just invokes scripts, might not be the worst to have a user ready one

andyross · 2024-06-11T19:26:47Z

Think this should be good now? The checkpatch failure is a false positive, it's complaining (legitimately, though the failure was already present) about multiply expanded arguments in a macro that merely got a term refactored. The sparse warnings seem to be entirely about upstream Zephyr code, I don't think I see anything specific to this PR and am pretty sure that's just how they look. The "zmain" failures are indeed screaming about the change to the fatal error handling that just merged; that does need to be fixed but obviously not in this PR.

wszypelt · 2024-06-12T09:20:02Z

@andyross in Internal Intel CI System/merge/builds, When we try to bind peakvol with kpb we get FW exception

Module: 5
Instance: 0
Core: 0
Pipeline: 0
Proc_domain: LL
Extended_init: False
2024-06-10 14:34:46,569 | Response InitInstance (E0000000, 00000022) received!
2024-06-10 14:34:46,569 | Sending InitInstance (C000000E, 00000011)
Payload:
00002328 00000100 00000040 00000000
00003E80 00000020 FFFF3210 00000000
00000000 00001804 00003E80 00000020
FFFF3210 00000000 00000000 00001804
00000010
Module: 14
Instance: 0
Core: 0
Pipeline: 0
Proc_domain: LL
Extended_init: False
2024-06-10 14:34:46,569 | Response InitInstance (E0000000, 00000011) received!
2024-06-10 14:34:46,569 | Sending BindModules (C5000004, 00000005)
2024-06-10 14:34:46,569 | Response BindModules (E5000000, 00000005) received!
2024-06-10 14:34:46,569 | Sending BindModules (C5000005, 0000000E)
2024-06-10 14:34:46,601 | Notification FwExceptionNotification (9B0A0000, 00000000)
Payload:
00000000 00000004 00000000 00000001
00000004 00000003 00000002 00000004
00001000 00000009 00000004 00020000
00000003 00000004 00000400 00000007
00000004 00000016 00000006 00000004
00000028 00000008 00000004 00000000
00000020 0000000A 00000004 00000003
0000000B 00000004 00000008 0000000D
00000004 00000100 0000000E 00000004
00000010 0000000F 00000004 00000008
00000010 00000004 00000010 0000000C
00000004 00000005 00000011 00000004
00000010 00000012 00000014 00000001
00000001 00000001 00000000 00000000
0000001D 00000004 00000001 00000002
00000004 00010000 00000015 00000004 received!

andyross · 2024-06-12T12:37:57Z

I don't know how to interpret that? Is there a stack trace or fault address or something you can match to the output binary? Is there a way I can reproduce the test, or at least see how it's built? Is it built entirely from code in the tree or is there more source code in your test I should be looking for?

Really this shouldn't be changing anything behaviorally. App-visible changes are limited to:

The size of the comp_dev structure increased by a word, so if you have test code making assumptions about it it will need porting.
The struct processing_module pointer in a dev is now in its own field (the new word) and no longer in the drvdata. Likewise if you have code using this it will need porting.

andyross · 2024-06-12T12:41:32Z

Yeah, I'm a little worried here. Looking at that dump, none of the notable strings (e.g. "FwExceptionNotification", "Proc_domain", "Extended_init") appear in the SOF tree at all. I'm worried this is not just a test, but a big blob of out-of-tree source code?

marcinszkudlinski · 2024-06-12T13:24:54Z

this is an exception CPU 0 EXCCAUSE 13 (load/store PIF data error)

FW log from this test:

[    0.000000] <inf> init: print_version_banner: FW ABI 0x301d000 DBG ABI 0x5003000 tags SOF:v2.5-stable-branch-2433-gc2194e7afd54 zephyr:v3.6.0-5059-g53ddff639562 src hash 0x3fba48f8 (ref hash 0x3fba48f8)
[    0.000000] <inf> clock: clock_set_freq: clock 0 set freq 20000000Hz freq_idx 0
[    0.000000] <inf> clock: clock_set_freq: clock 1 set freq 20000000Hz freq_idx 0
[    0.000000] <inf> clock: clock_set_freq: clock 2 set freq 20000000Hz freq_idx 0
*** Booting Zephyr OS build v3.6.0-5059-g53ddff639562 ***
[    0.000000] <inf> ipc: telemetry_init: Telemetry enabled. May affect performance
[    0.000000] <inf> main: sof_app_main: SOF on intel_adsp
[    0.000000] <inf> main: sof_app_main: SOF initialized
[    0.000008] <inf> ipc: ipc_cmd: rx	: 0x43000000|0x30701000
[    0.000010] <inf> ipc: ipc_cmd: rx	: 0x43000000|0x30801000
[    0.000010] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x3070000c
[    0.000010] <wrn> basefw_intel: basefw_set_fw_config: returning success for Set FW_CONFIG without handling it
[    0.000011] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x31400008
[    0.001531] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x30600064
[    0.006800] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x3070000c
[    0.006896] <wrn> basefw_intel: basefw_set_fw_config: returning success for Set FW_CONFIG without handling it
[    0.033406] <inf> ipc: ipc_cmd: rx	: 0x11000005|0x0
[    0.033445] <inf> pipe: pipeline_new: pipeline new pipe_id 0 priority 0
[    0.035891] <inf> ipc: ipc_cmd: rx	: 0x40000004|0x15
[    0.036230] <inf> dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0
[    0.039350] <inf> ipc: ipc_cmd: rx	: 0x40000005|0x22
[    0.040905] <inf> ipc: ipc_cmd: rx	: 0x4000000e|0x11
[    0.041018] <inf> kpb: kpb_new: kpb_new()
[    0.042283] <inf> ipc: ipc_cmd: rx	: 0x45000004|0x5
[    0.042345] <inf> ipc: buffer_new: buffer new size 0x200 id 0.0 flags 0x0
[    0.042493] <inf> pipe: pipeline_connect: comp:0 0x4 connect buffer 0 as sink
[    0.042523] <inf> pipe: pipeline_connect: comp:0 0x5 connect buffer 0 as source
[    0.043981] <inf> ipc: ipc_cmd: rx	: 0x45000005|0xe
[    0.044055] <inf> ipc: buffer_new: buffer new size 0x200 id 0.0 flags 0x0
[    0.044175] <err> os: print_fatal_exception:  ** FATAL EXCEPTION
[    0.044225] <err> os: print_fatal_exception:  ** CPU 0 EXCCAUSE 13 (load/store PIF data error)
[    0.044250] <err> os: print_fatal_exception:  **  PC 0xa0063344 VADDR 0x94
[    0.044276] <err> os: print_fatal_exception:  **  PS 0x60420
[    0.044303] <err> os: print_fatal_exception:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:4 CALLINC:2)
[    0.044331] <err> os: xtensa_dump_stack:  **  A0 0xa006210c  SP 0xa0103fd0  A2 (nil)  A3 0xa01040c4
[    0.044360] <err> os: xtensa_dump_stack:  **  A4 0xa0103fe4  A5 0x40110540  A6 0x5  A7 0xa0103fd0
[    0.044385] <err> os: xtensa_dump_stack:  **  A8 0xa0063341  A9 0xa0103fb0 A10 0xa0111280 A11 0x100
[    0.044411] <err> os: xtensa_dump_stack:  ** A12 0x8 A13 0x400f7bcc A14 0x200 A15 0xa0103f60
[    0.044440] <err> os: xtensa_dump_stack:  ** LBEG 0xa007cbf9 LEND 0xa007cc05 LCOUNT (nil)
[    0.044465] <err> os: xtensa_dump_stack:  ** SAR 0x20
[    0.044491] <err> os: xtensa_dump_stack:  **  THREADPTR 0x200




Backtrace:0xa0063341:0xa0103fd0 0xa0062109:0xa01040c0 0xa003c0be:0xa01040f0 0xa006405a:0xa0104120 0xa0063afc:0xa0104140 0xa00676d2:0xa0104160 0xa0057e21:0xa0104180 0xa005b67b:0xa01041b0 



[    0.045170] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[    0.045216] <err> os: z_fatal_error: Current thread: 0x4010a7c0 (unknown)
[    0.055186] <err> coredump_error: coredump_mem_window_backend_start: #CD:BEGIN#
[    0.056478] <err> coredump_error: coredump_mem_window_backend_end: #CD:END#
[    0.056915] <err> zephyr: k_sys_fatal_error_handler: Halting system

zephyr.map.gz

map file attached for backtrace decoding

andyross · 2024-06-12T14:28:37Z

That PC is in ipc_comp_connect, but the map file has no disassembly. Seems reasonable that it's instantiating a kpb component given the logs. Is that not exercised by any tests I can run locally?

And can you repeat that build with CONFIG_OUTPUT_DISASSEMBLY=y and provide the resulting zephyr.lst file? (I hate that Zephyr turned this off by default, it takes time in CI but is irreplaceable for stuff like this). You can also get it with "objdump -D" on zephyr.elf (the objdump that matches the toolchain in use, obviously).

andyross · 2024-06-12T14:47:30Z

Also when you guys have bandwidth, it would be helpful to try to try the test with each of the final two patches in this PR, which both stand alone, to see which is causing the failure (the first is just dead code removal and surely uninvolved -- it would cause a build failure, not a runtime failure, if something was using it).

marcinszkudlinski · 2024-06-13T12:52:37Z

@andyross
maybe not the rootcause but "a hint" I found today:

after your change

dstmod is NULL, and is being dereferenced in the very next line

EDIT:
as I see you set mod field in module_adapter_new. Problem is that not all modules use module adapter, some are not converted yet - like KPB module.

This would have certainly blown up on someone at some point even if it hasn't yet.

well... it just has. It was working only because KPB use dev->priv_data as it was originally intended to be used - as a container for a private data set and the pointer wasn't null. And was mapped to struct processing_module :(

marcinszkudlinski · 2024-06-13T14:56:17Z

@andyross I'll provide a fix for this.

andyross · 2024-06-13T16:13:33Z

Ouch. So if I'm understanding you:

The test is instantiating a kpb component via some path other than a processing_module creation.
It's connecting it to another module via an IPC4 MOD_BIND command.
The module backpointer is (now) NULL, so this crashes.
But it used to "work", because the drvdata field being used as the module backpointer (which was actually a kpb-specific data pointer and NOT a processing_module) was non-NULL.

Ouch, again. So it's actually a nice catch for that patch, I guess, the only issue being that the test that detected the failure was out-of-tree. This doesn't seem like something that should be in a vendor test? Any chance you can commit that somewhere so it can be exercised by other folks? I'd totally have fixed this myself if I'd seen it, I promise. :)

Regardless: any possibility of merging this ahead of the kpb (or test? seems like the real bug is wherever that kpb is getting created, not in the device code itself) fix and filtering/whitelisting the failure on your end, given that it's been root caused and isn't an in-tree test?

marcinszkudlinski · 2024-06-14T09:15:02Z

@andyross
084c718
I can merge it in a separate PR, but since it makes conflict with your code it would be faster and more convenient to add my commit to your PR.

This is dead code: just a wrapper around module_adapter_new(), which is a public API already in use. Presumably a forgotten relic. Remove. Signed-off-by: Andy Ross <andyross@google.com>

The current SOF architecture splits a "module" across two structs, the legacy comp_dev and a newer struct processing_module. The latter contains a pointer to the former, but in many places existing code has needed a backpointer to recover the module from a component. So far this has abused the drvdata mechanism to store this as a void*, but that's fragile as many components are already (!) using drvdata for other purposes and will clobber the setting. The fact that it worked is seeming just by luck. That pointer is for the exclusive use of the comp_driver code associated with a component to store its own data. We can't be touching it from the global module code. Just give the pointer a properly-typed field of its own and make sure the two are initialized in tandem. Longer term, SOF really needs to fix this bifurcation and unify the two structs. Signed-off-by: Andy Ross <andyross@google.com>

The pipeline_id has historically been part of the comp_buffer struct, but that is being deprecated as a public API. So move it down into the contained sof_audio_stream_params struct where it can be found by new style sink/source code. Note that the actual value of the pipeline ID is a little ambiguous: on IPC3, the buffer is defined by the user in the .tplg file as part of a specific pipeline with a known ID. With IPC4 topology, the buffers are implicitly created and will be assigned the ID of their source (!) component. It is legal to define a connection across two pipelines, and there's no ability here to recover both pipeline IDs. Signed-off-by: Andy Ross <andyross@google.com>

Expose the new pipeline_id field in sof_audio_stream_params to a cleaner sink/source API for use by module code. Longer term this may want to be indirected by newer backends. Signed-off-by: Andy Ross <andyross@google.com>

andyross · 2024-06-14T14:30:53Z

Easy enough, seems clean to me in cyclic review. Everyone else should check the new patch too.

Note that I left it in place at the end of the series rather than re-refactor it twice, which is technically a bisection issue because the crash is introduced by an earlier patch. I argue that's OK since the crash isn't visible using in-tree tests, and because the root cause bug was pre-existing, it just wasn't a crash.

kv2019i

Looks great! Some typos, but nothing to warrant an update of the patch series, so if nobody objects, let's merge this today.

src/audio/pipeline/pipeline-schedule.c

Legacy API that is not using module_adapter is now depreciated, but there are still some modules that use it. So all common code must work properly with both types of modules. This commit is fixing crash in bind operation when binding a legacy module. There also some comments added in potentially similar places, but where legacy modules cannot be used. Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>

andyross · 2024-06-17T19:55:55Z

Took the liberty of fixing up @marcinszkudlinski 's comment per review above; hope that's OK. Also rebased.

andyross requested review from a team, RanderWang, abonislawski, bardliao, dbaluta, fkwasowi, jxstelter, kv2019i, lbetlej, lgirdwood, marcinszkudlinski, mmaka1, pblaszko, plbossart, ranj063 and singalsu as code owners June 7, 2024 16:40

andyross mentioned this pull request Jun 7, 2024

Google AEC rework: dynamic formats, legacy platform support #8919

Merged

cujomalainey approved these changes Jun 7, 2024

View reviewed changes

marcinszkudlinski previously approved these changes Jun 11, 2024

View reviewed changes

marcinszkudlinski reviewed Jun 11, 2024

View reviewed changes

src/include/module/audio/audio_stream.h Show resolved Hide resolved

lgirdwood approved these changes Jun 11, 2024

View reviewed changes

andyross force-pushed the comp-mod-refactor branch 3 times, most recently from 5350c7a to 35d25b1 Compare June 11, 2024 18:58

marcinszkudlinski approved these changes Jun 12, 2024

View reviewed changes

andyross added 4 commits June 14, 2024 07:28

module_adapter: Remove modules_shim_new()

de5bfbf

This is dead code: just a wrapper around module_adapter_new(), which is a public API already in use. Presumably a forgotten relic. Remove. Signed-off-by: Andy Ross <andyross@google.com>

sink/source: Add pipeline_id accessor API

713dfdd

Expose the new pipeline_id field in sof_audio_stream_params to a cleaner sink/source API for use by module code. Longer term this may want to be indirected by newer backends. Signed-off-by: Andy Ross <andyross@google.com>

andyross force-pushed the comp-mod-refactor branch from 35d25b1 to 029758c Compare June 14, 2024 14:28

kv2019i approved these changes Jun 17, 2024

View reviewed changes

src/audio/pipeline/pipeline-schedule.c Outdated Show resolved Hide resolved

andyross force-pushed the comp-mod-refactor branch from 029758c to d38f9cd Compare June 17, 2024 19:55

marcinszkudlinski approved these changes Jun 18, 2024

View reviewed changes

kv2019i merged commit 5f5fdb6 into thesofproject:main Jun 18, 2024

component/module refactoring pass #9211

component/module refactoring pass #9211

Uh oh!

Conversation

andyross commented Jun 7, 2024

Uh oh!

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

lgirdwood commented Jun 11, 2024

Uh oh!

marcinszkudlinski commented Jun 11, 2024

Uh oh!

lgirdwood commented Jun 11, 2024

Uh oh!

wszypelt commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgirdwood commented Jun 11, 2024

Uh oh!

andyross commented Jun 11, 2024

Uh oh!

cujomalainey commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andyross commented Jun 11, 2024

Uh oh!

cujomalainey commented Jun 11, 2024

Uh oh!

andyross commented Jun 11, 2024

Uh oh!

wszypelt commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andyross commented Jun 12, 2024

Uh oh!

andyross commented Jun 12, 2024

Uh oh!

marcinszkudlinski commented Jun 12, 2024

Uh oh!

andyross commented Jun 12, 2024

Uh oh!

andyross commented Jun 12, 2024

Uh oh!

marcinszkudlinski commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcinszkudlinski commented Jun 13, 2024

Uh oh!

andyross commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcinszkudlinski commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andyross commented Jun 14, 2024

Uh oh!

kv2019i left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andyross commented Jun 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wszypelt commented Jun 11, 2024 •

edited

Loading

cujomalainey commented Jun 11, 2024 •

edited

Loading

wszypelt commented Jun 12, 2024 •

edited

Loading

marcinszkudlinski commented Jun 13, 2024 •

edited

Loading

andyross commented Jun 13, 2024 •

edited

Loading

marcinszkudlinski commented Jun 14, 2024 •

edited

Loading