-
Notifications
You must be signed in to change notification settings - Fork 349
Heap refinement Part3 -- make full use of HP SRAM banks #4521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
One of the error messages in the checktree failure at https://github.com/thesofproject/sof/pull/4521/checks?check_run_id=3123030117 is : Same as https://sof-ci.01.org/sofpr/PR4521/build9710/build/jsl_gcc_error.txt checktree runs a highly concurrent build (it's part of that test) so it's always better to fix errors in other, slower checks first because its logs are very hard to read. BTW you can compile-test all released platforms in parallel with a single |
Thanks for information, let me fix it today. |
src/platform/intel/cavs/lib/memory.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me change to calculate this size locally.
src/platform/intel/cavs/lib/memory.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"times" seems a bit unclear - count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
um, we already have HEAP_COUNTxxxx, so.., "times" means multiplying here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I see it was already used before this patch...
src/platform/intel/cavs/lib/memory.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t
src/platform/intel/cavs/lib/memory.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned int count; or n or something simple
ba00319 to
eb7edc7
Compare
|
@marc-hb do you have insight about why qemu boot fail on APL here? |
No, looks like the firmware just failed to boot. It could be for many reasons. It happened to me some time ago and it was because I was reading un-initialized memory. This was working (sometimes?) on hardware but not on QEMU. This is unfortunately difficult to debug because gdb does not work, on the other hand it's very easy to reproduce, just run the same scripts in |
9efc51e to
323060a
Compare
When I just went through the qemu/hw/adsp/dsp/cavs.c, the ADSP_CAVS_1_x_DSP_HP_SRAM_SIZE looks not quite correct there, not sure if it will lead this this boot fail issue.
Haven't touched the QEMU part for years, @marc-hb do you have a guide about how to build and run it on a non-docker environment? |
The bad thing about our qemu status is that gdb (where a lot of the qemu value is) does not work right now. The good qemu thing that we still have is that we can still change one line or even one character and retest instantly. Very efficient.
It had been a long time for me too but I just ran the same scripts than CI and it fortunately worked inside and outside docker the same. No guide was required. Pay attention to the current directory though; the only "comfort" change I made was thesofproject/qemu#46 Appreciate if you can test that change (it's been a while) I also tried the sof-5.2 branch in the hope gdb would work but I would not recommend it yet: gdb still does not work, it's more difficult to compile and it's not tested by CI |
|
@marc-hb I realize the APL qemu boot fail happens with gcc built FW only, I have build failure with gcc for APL, looks updating cross-ng doesn't fix the building issue for me... |
|
I don't see any connection between this PR breaking the boot on QEMU and your build failure, is there one? I have been building APL with the crosstool-ng toolchain a million times over the last year and I never had any issue, neither with gcc 8 nor with the more recent gcc 10. Same for github actions and Jenkins. https://thesofproject.github.io/latest/getting_started/build-guide/build-from-scratch.html#build-sof has recently been tested and updated by both @ujfalusi and myself, so you can start that from scratch. Or use docker? |
|
Thanks @marc-hb I managed to run this locally without docker, but looks it boots for me without errors: |
|
Looks like you're not running the test. |
e0ebf34 to
1dd7eb3
Compare
Just did an debugging and read out the Zephyr memory size for different platforms, pushed the updates. |
1dd7eb3 to
2d76519
Compare
|
@lgirdwood the Kconfig item added also. |
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Zephyr heap implementation in wrapper.c is very basic, so I think we can proceed with a fairly simple approach there and focus this review on the XTOS aspects.
zephyr/wrapper.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about for IMX (the above #if-else statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it intact for the IMX case here?
zephyr/wrapper.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is a shortterm workaround (e.g. Zephyr's libc already has similar code to figure out the heap size:
zephyr/lib/libc/newlib/libc-hooks.c ), so I'd prefer a very simple definition and add a TODO comment to clearly mark this is a stopgap.
zephyr/wrapper.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove these comments and add one "/* TODO: heap size should be set based on linker output */" instead at top.
zephyr/wrapper.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "CONFIG_HP_MEMORY_BANKS >= 16"
zephyr/wrapper.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"CONFIG_HP_MEMORY_BANKS < 16"
zephyr/wrapper.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
">= 30"
zephyr/wrapper.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is problematic. We can get size of the heap from linker (and hardcode it in shortterm), but the ratio of the heap that should be allocated to coherent-versus-noncoherent usage is SOF specific information and that we'd have to get from SOF. We should assume "half is cacheable" here, but rather this should come from some common definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i we actually share the same zone for cached/uncached requests after this PR on the XTOS version, this makes us more flexible and the portion can come from runtime not the built time. For the Zephyr version, if we don't plan to use 'ENABLE_CACHED_HEAP' anymore, we actually don't need to care the portion here.
All the change here I made just an example for unblocking the Zephyr build, I rely on you and @lyakh to provide a specific commit or incremental change on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack @keyonjie . So yeah, then I think we should follow suite with Zephyr and use a single heap as well and serve both cached and non-cached allocations from the same heap area. No need to block this PR to wait for the Zephyr work to complete.
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments in the Zephyr part needed so everyone knows this this is temp, but should be good to merge tomorrow.
@kv2019i any reason why the Zephyr heaps are partitioned in cache and uncached ? Can we have one type and do the type conversion in a local wrapper (since its all the same PHY memory) ?
|
@lgirdwood wrote:
This was done only to ensure cache coherence for the heap metadata as the sys_heap.h implementation stores metadata in-between the allocated chunks, and multiple cores may access the metadata at any given time. OTOH, our current cached heap doesn't really work yet as we still observe corruption of metadata (albeit only in long stress tests, but still something is wrong). Once we get a standalone cached heap to work, we could take another look at whether same coherence protocol (for the metadata) could be implemented within context of a single heap. I do agree not doing a static split between the two, would be a much better design. |
Zephyr does not even build right now: zephyrproject-rtos/zephyr#38349 |
2d76519 to
1857c3e
Compare
|
@lgirdwood the interesting thing is that if we are hitting kinds of CI failure with keeping uses cached buffers, please see the latest CI results. |
The allocator will return failure if the request size is too big, remove the unsuitable check here. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
246ea64 config: skl: change image_size to the real SRAM size 9c9e07c config: kbl: change image_size to the real SRAM size 44b37d1 config: sue: change image_size to the real SRAM size 580e4d6 config: tgl-cavs: change image_size to the real SRAM size Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
1857c3e to
b2d058a
Compare
|
@keyonjie some red on internal CI, can you split the PR so we can bisect the problematic patch. Some of the updates are good to go now. |
We have change the size of runtime_shared zone to be decided at link stage, here hardcode HEAP_RUNTIME_SHARED_SIZE to unblock the Zephyr building. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
|
@lgirdwood planing to split it to 3, the 1st one here: Let's merge the 1st and 2nd first if the CI result good there. |
This is a refinement of the heap memory management: 1. refine and simplify the allocator, call rzalloc() and rbolloc() with flag SOF_MEM_FLAG_COHERENT for uncached/coherent address usage. 2. remove the SYS_SHARED and RUNTIME_SHARED zones, merge the usage of them to the corresponding SYS_RUNTIME and BUFFER zones. 3. make full use of the SRAM, with making the size of the buffer zone configured at linking stage. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Align to the new heap memory map and allocator, smaller SYSTEM and RUNTIME zones are used on apollolake now. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
The mux cd could be bigger than 1KB, change to use rballoc() for allocating the buffer, to try to ensure the allocation is success. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
After zones are refined, need an alignment to the latest cavs version. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
To rule out buffer coherency issue, it is useful to add a debug option with which selected only coherent/uncached buffer will be used. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
The lli structs are already allocated them from the coherent heap buffer so no need invalidation and write back anymore. Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
b2d058a to
bb7c772
Compare
| #define HEAP_SIZE 0x80000 | ||
| #elif (CONFIG_HP_MEMORY_BANKS < 45) | ||
| /* e.g. TGL-H */ | ||
| #define HEAP_SIZE 0x100000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify how this is handled in rimage, your changes only deal with 'tgl' and there's no specific memory size for TGL-H.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify how this is handled in rimage, your changes only deal with 'tgl' and there's no specific memory size for TGL-H.
The memory size is configured in the Kconfig, see it here:
| CONFIG_HP_MEMORY_BANKS=30 |
For rimage, we need to specify image_size according to the memory size, see it here:
https://github.com/thesofproject/rimage/blob/9a26e4558094f19f3f7becd89eb9e8a9a9dd82b9/config/tgl-h.toml#L6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you didn't modify the tgl-h.toml file in this PR, did you? see 1b0757e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you didn't modify the tgl-h.toml file in this PR, did you? see 1b0757e
That's because the tgl-h.toml is already updated before my PR, the rimage is already updated to 9a26e45 in the sof upstream repo:
9a26e45 config: apl: change image_size to the real SRAM size
8a2ea00 config: cnl: change image_size to the real SRAM size
1de9ccc config: jsl: change image_size to the real SRAM size
f52a078 config: icl: change image_size to the real SRAM size
8073ea3 config: tgl-h: change image_size to the real SRAM size
9e50a02 config: tgl: change image_size to the real SRAM size
src/drivers/dw/dma.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alignment?
| #define HEAP_SYS_RUNTIME_0_BASE 0xBE120000 | ||
| #define SOF_CORE_S_START 0xBE140000 | ||
| #define HEAP_RUNTIME_BASE 0xBE180000 | ||
| #define HEAP_BUFFER_BASE 0xBE1C0000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how anyone could review such changes without any comments on how this is inferred?
|
|
||
| cd = rzalloc(SOF_MEM_ZONE_RUNTIME, 0, SOF_MEM_CAPS_RAM, | ||
| /* allocate quite a big buffer, use rballoc to make sure it will succeed */ | ||
| cd = rballoc(0, SOF_MEM_CAPS_RAM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we special-case a component? something's fishy here. why not the mixer as well then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we special-case a component? something's fishy here. why not the mixer as well then?
The size of 'cd' here is > 1KB, we have very limited runtime blocks on APL, so any ask for allocation size >= 1KB are suggested to use the rballoc() and allocation will happen on the BUFFER zone.
The mixer 'cd' size is very small, using rzalloc() to allocate e.g. a 64Bytes buffer could help us to save memory usage.
And yes, maybe we should make the allocator more smart in advance, and then the callers can use an unified helper and no need to care of details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our allocator is simple and showing its limitations. The Zephyr allocator is far better.
|
@lgirdwood I am splitting the biggest commit 'alloc: heap memory refinement ' to make life easier, will submit some low hanging fruits first. |
|
let me close this one and go with smaller PRs. |
This series implements the calculation of the runtime heap at run time, to make full use of HP SRAM banks on Intel cAVS platforms.