Skip to content

Conversation

@marcinszkudlinski
Copy link
Contributor

this is fix for #10024

Rootcause:

  1. In case of cross core buffers, there's no guarantee that buffer is freed on the same core that it was created by. It depends which end of cross-core buffer (which component) is freed as second - the buffer will be freed by the core of second component.

  2. buffer memory is managed by a feature called "virtual heaps" that enforces freeing of memory on the same core that it was allocated ====> EXCEPTION

  3. even if virtual heaps were cross-core, there's still a problem with invalidation of cache of memory being freed. It MUST be performed on a core that was producing data - and there's no way to override it. If the cache is not invalidated, it will be written back by the core itself leading either to data corruption or to exception in case the virtual memory has been unmapped.

SOLLUTION:

  • make virtual memory cross-core capable
  • add "unbind" hook to sink API, invalidate cache in this hook

@marcinszkudlinski marcinszkudlinski changed the title Virtual heaps core Cross core Virtual heaps, invalidation of cross core buffers in unbind call Jun 6, 2025
@marcinszkudlinski marcinszkudlinski marked this pull request as ready for review June 6, 2025 06:17
Copilot AI review requested due to automatic review settings June 6, 2025 06:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a fix for cross‐core virtual heaps and buffer invalidation during unbind calls by refactoring the virtual heap management and converting all module binding/unbinding functions to use strongly typed structures (bind_info and unbind_info). In addition, a new cache invalidation hook is added for ring buffers to ensure proper data consistency on cross‐core operations.

  • Refactored virtual heap allocation to use a single pointer rather than a per-core array.
  • Updated API signatures across the codebase (including IDC, component, module adapter, tester, and copier layers) to use bind_info/unbind_info structures.
  • Introduced on_bind/on_unbind hooks for source/sink APIs and added cache invalidation in the ring buffer unbind function.

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated no comments.

Show a summary per file
File Description
zephyr/lib/alloc.c Refactored virtual heap pointer from an array to a single pointer for cross-core memory management.
zephyr/include/sof/lib/regions_mm.h Updated VMH heap API signatures and removed the core_id field from the vmh_heap structure.
src/ipc/ipc4/helper.c Changed binding functions to use bind_info/unbind_info structures in place of void pointers.
src/include/sof/audio/module_adapter/module/generic.h Updated module bind/unbind function signatures to use structured bind_info/unbind_info parameters.
src/include/sof/audio/component_ext.h Modified component extension API functions to use the new binding/unbinding parameter types.
src/include/sof/audio/component.h Changed component operation function pointers to accept bind_info/unbind_info types.
src/include/module/module/interface.h Updated module interface function signatures for bind/unbind to support the new structures.
src/include/module/audio/source_api.h Added on_bind/on_unbind hooks and a bound_module field for tracking source API bindings.
src/include/module/audio/sink_api.h Added on_bind/on_unbind hooks and a bound_module field for sink API bindings.
src/idc/idc.c Updated IDC binding functions to cast payloads into bind_info/unbind_info types.
src/debug/tester/tester.h Modified tester interface bind/unbind function signatures to utilize bind_info/unbind_info.
src/debug/tester/tester.c Updated tester binding functions according to the new API signature.
src/audio/module_adapter/module_adapter_ipc4.c Adapted module adapter bind/unbind functions to use the new structured parameters.
src/audio/module_adapter/module/generic.c Revised module bind/unbind implementations to include source/sink hook calls with bind_info/unbind_info.
src/audio/mixin_mixout/mixin_mixout.c Updated mixout binding functions to operate with the new bind_info/unbind_info types.
src/audio/kpb.c Modified KPB binding functions to use the new structured parameters (note an incorrect debug log).
src/audio/dai-zephyr.c Updated DAI unbind signature to accept unbind_info instead of a generic void pointer.
src/audio/copier/dai_copier.h Revised DAI copier unbind signature for consistency with new unbind_info parameter type.
src/audio/copier/copier.c Updated copier binding/unbinding functions to use bind_info/unbind_info structures.
src/audio/buffers/ring_buffer.c Introduced an on_unbind hook that invalidates cache for ring buffers upon disconnection.
Comments suppressed due to low confidence (1)

src/audio/kpb.c:392

  • The debug message in the kpb_unbind() function incorrectly states 'kpb_bind()'. It should be updated to 'kpb_unbind()' to accurately reflect the operation.
comp_dbg(dev, "kpb_bind()");


if (bind_data->as_source) {
ret = source_bind(bind_data->as_source, mod);
if (ret)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to unbind sink here? And in the potential error case below? Same in module_unbind()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, sink/src API is specific to a single component only. "Bind" for sink/source means a connection to a single component and nothing more.
Bind/unbind will be called separately for both components in question, each call on proper core in case of cross core connection. So will be bind/unbind for sink/src APIs
As to error patch: it will NEVER be situation where as_source != NULL && as_sink!=NULL

maybe it will be better to do this in more stright way:

struct bind_info {
  struct ipc4_module_bind_unbind *ipc4_data;
  bool is_source; // will think about better name
  union  {
     struct sof_source *source;
     struct sof_sink *sink;
  }
}

what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcinszkudlinski aha, so as_source and as_sink are mutually exclusive, they cannot both be non-NULL at the same time? Then yes, would be good to make that explicit, at least with an else. And yes, using a union would be another possibility too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api changed

* core that have been using sink API
*/
ring_buffer_invalidate_shared(ring_buffer, ring_buffer->_data_buffer,
ring_buffer->data_buffer_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to also free the buffer here? might need to unbind it from the other side too if not done yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. Now free is performed when a module is deleted.

OOOh! just got an idea! It has always been a memory leak!
Bind (create bufer) -> unbind (do nothng - don't delete) --> bind (create another)....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh @kv2019i just a question - is unbind called for IPC3 also?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcinszkudlinski No, I don't think it's called at all. In IPC3, there's just a call to pipeline_disconnect() (for both buffers connected to a component) and this internally calls buffer_detach().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I checked, buffer free is called in unbind.
It wasn't till this PR because "unbind" call was ignored in most cases

	ret = comp_unbind(src, &unbind_data);

	unbind_data.bind_type = COMP_BIND_TYPE_SOURCE;
	unbind_data.source = audio_buffer_get_source(&buffer->audio_buffer);
	ret1 = comp_unbind(sink, &unbind_data);

	ll_unblock(cross_core_unbind, flags);

	buffer_free(buffer);

tr_err(&zephyr_tr, "Unable to init virtual heap for core %d!", core);

virtual_buffers_heap[core] = heap;
virtual_buffers_heap = vmh_init_heap(&static_hp_buffers, MEM_REG_ATTR_CORE_HEAP, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, wasn't the actual meaning of CORE_HEAP that it was per core? Maybe it isn't needed any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obsolete name - used to be per heap, now is global, Will change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to MEM_REG_ATTR_SHARED_HEAP

MEM_REG_ATTR_* are defined in Zephyr. Zephyr creates one virtual memory space per each core, one as "shared", one for other purposes (MEM_REG_ATTR_OPPORTUNISTIC_MEMORY ??).
Anyway,

  • all spaces are pure virtual and don't use any physical memory banks till memory is allocated, so they can stay in Zephyr.
  • In Zephyr all regions are treated in the very same way - in fact no difference at all which region is used
  • SOF used to use 3 "per core" areas, now is using one "shared"
  • "shared" size is 0x100000 == 1MB, and it is hard-coded in zephyr (should be enough)

maybe it is worth to introduce some kconfig options enabling regions and adjusting sizes, but it requires a PR to Zephyr.


struct vmh_heap *heap = vmh_init_heap(NULL, virtual_memory_region[i].attr,
i, true);
struct vmh_heap *heap = vmh_init_heap(NULL, virtual_memory_region[i].attr, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but here in line 239/238 num_regions still counts all cores, which then becomes the iteration count for this loop?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcinszkudlinski this seems to still be open - this shouldn't be called in a loop any more, should it?

Copy link
Contributor Author

@marcinszkudlinski marcinszkudlinski Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? All of virtual memory tables are still in Zephyr, This test uses all of them and this is fine.
There's no kconfig option in zephyr that turns off any of the virtual memory

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcinszkudlinski how will it use all of them now, that you've modified vmh_init_heap() to only use heap 0 for all core heaps?

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits inline. I think the major issues related to last commit and whether all logic related to per-core heaps are removed (see Guennadi's comment to vmh.c).

This is quite a bit change, but at least I welcome making virtual heap usable across cores.

}

comp_err(dev, "No sink buffer found for src_queue = %u", src_queue_id);
return -ENODEV;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, Something odd with all the commit messages. They are line-wrapped to under 50 columns...? Look a bit odd in git history compared to other commits in SOF.

* core that have been using sink API
*/
ring_buffer_invalidate_shared(ring_buffer, ring_buffer->_data_buffer,
ring_buffer->data_buffer_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcinszkudlinski No, I don't think it's called at all. In IPC3, there's just a call to pipeline_disconnect() (for both buffers connected to a component) and this internally calls buffer_detach().

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvements - almost ready, a few opens still.

@marcinszkudlinski
Copy link
Contributor Author

Step 1 - change bind/unbind API to be more clear

@marcinszkudlinski
Copy link
Contributor Author

step2 - rebase & conflict resolve

/* in case of disconnection, invalidate all cache. This method is guaranteed be called on
* core that have been using sink API
*/
ring_buffer_invalidate_shared(ring_buffer, ring_buffer->_data_buffer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of this function is actually somewhat confusing: usually in SOF "shared" means "uncached"

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to use the enum for sink/source type but test results from @jsarha are passing so fix is validated !

@marcinszkudlinski marcinszkudlinski force-pushed the virtual_heaps_core branch 4 times, most recently from e5a1b79 to e208721 Compare June 16, 2025 14:02
Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concerning full testing of the Zephyr memory blocks API - I think we need a twister test for that. Maybe test/vmh.c could be moved to Zephyr, that would improve testing, I think

i < CONFIG_MP_MAX_NUM_CPUS + VIRTUAL_REGION_COUNT; i++) {
/* Search for matching attribute so we place heap on shared virtual region */
for (i = CONFIG_MP_MAX_NUM_CPUS;
i < CONFIG_MP_MAX_NUM_CPUS + VIRTUAL_REGION_COUNT; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this would break with zephyrproject-rtos/zephyr#91666 . I think we need a Zephyr function to find a region by the attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true! CONFIG_MP_MAX_NUM_CPUS + VIRTUAL_REGION_COUNT as a table size - instead of a single macro - should have never been merged :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can proceed with Zephyr changes

  • I'll add a function finding a region by attribute and core num to zephyr.
  • I'll create a PR to SOF using the zephyr function
  • PR must be merged together with moving to new zephyr head

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys_mm_drv_query_memory_regions is a standard zephyr function. The API assumes that at the end the array of regions should be freed via sys_mm_drv_query_memory_regions_free. The end of the array is marked with an empty element, so the loop doesn't have to rely on knowledge of its size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcinszkudlinski do you want to do the Zephyr changes incrementally, i.e. we merge this and give it a a few days under CI testing ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR should go first, it works with current zephyr

than, zephyr change should be merged
as the last step - move SOF to new zephyr + changes in SOF (in one PR)

Currently bind api call passes IPC4 header
only. This commit extends API by adding
pointers to sink/source of the entity that is
being bound.
Note that in pipeline 2.0 there's a possibility
to bind a module to a module, therefore pointers
to sink/src need to be provided instead of
a pointer to a buffer

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
Currently unbind api call passes IPC4 header
only. This commit extends API by adding
pointers to sink/source of the entity that is
being bound.
Note that in pipeline 2.0 there's a possibility
to bind a module to a module, therefore pointers
to sink/src need to be provided instead of
a pointer to a buffer

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
This commit adds bind/unbind hook to sink/source API
It is guaranteed that the hooks will be called on
core that the API will be / was used on, what makes
them a perfect place for cache operations needed at
start/end of usage.
Note that free method is not a good place for this, as
in case of shared buffers it may happen on either of
the cores the buffer connects

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
Unbind used to be ignored for components in the same pipeline
But there are some code that depends on the event and
need to have it in all situations, i.e. dai_zephyr_unbind
or cross-core DP connections in the same pipeline

Propagate the event in all cases

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
unbind hook in buffer is guaranteed to be called on the
core that was using the API. Free, however, may be
called on other core

Add cache invalidation, that must be called on core that
was producing data (user of sink API) in unbind hook.

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
struct vmh_heap is a private structure and should not
be used outside of regions_mm.c file

structure moved

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
In case of cross core buffers a buffer may be allocated
and freed by different core, depending on which
component is deleted second.
As all control structures of virtual heaps are
stored in uncached aliases, there's no technical
problems with allowing virtual heaps to work cross
core.
The only consideration is that in case of cross core
allocate/free the cache invalidation MUST be performed
on the core that was storing data. It is up to the
memory user to ensure this

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
virtual heap may be called from multiple threads,
all operations should be protected my mutex

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
struct sof_audio_buffer is a "base class" in
a meaning of C++ of all buffers in SOF (currently
comp_buffer and ring_buffer). Freeing of memory was
based on indirect assumption that struct sof_audio_buffer
is a very first field of the buffer structure

this commit moves freeing of buffer to a method
specific for each buffer type, enforcing that proper
pointer will be used regardless of buffer structure

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
@marcinszkudlinski
Copy link
Contributor Author

faiures because: docker: failed to register layer: write /usr/lib/x86_64-linux-gnu/libicuuc.a: no space left on device

@lgirdwood
Copy link
Member

faiures because: docker: failed to register layer: write /usr/lib/x86_64-linux-gnu/libicuuc.a: no space left on device

ack - think we've root caused to Zephyr SDK growing to 11G when GH action limit is 14G

@lgirdwood
Copy link
Member

faiures because: docker: failed to register layer: write /usr/lib/x86_64-linux-gnu/libicuuc.a: no space left on device

ack - think we've root caused to Zephyr SDK growing to 11G when GH action limit is 14G

Jenkins passing - the fix for Zephyr SDK will take longer, we should merge things in the mean time.

Zephyr provides a special macro, SYS_MM_DRV_MEMORY_REGION_FOREACH
for iteration through all memory regions.
This commit use this macro instead of a for loop that requires
information about number of virtual memory regions provided
by Zephyr.

Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
@marcinszkudlinski
Copy link
Contributor Author

added one commit - use SYS_MM_DRV_MEMORY_REGION_FOREACH macro for virtual memory regions iterating. Now the Zephyr changes are 100% independent

@lgirdwood lgirdwood merged commit 7fefb0d into thesofproject:main Jun 25, 2025
24 of 49 checks passed
@marcinszkudlinski marcinszkudlinski deleted the virtual_heaps_core branch June 25, 2025 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants