Skip to content

Conversation

@keyonjie
Copy link
Contributor

This series implements the calculation of the runtime heap at run time, to make full use of HP SRAM banks on Intel cAVS platforms.

Implementation of calculating the runtime heap size at run time, which
leverage the value of the _runtime_heap_size from platform linker script
at linking stage.

This will help to make full use of the HP SRAM banks for each cAVS
platforms, all of them available in .bss section will be used for
runtime heap allocation.

@keyonjie keyonjie requested a review from ranj063 July 21, 2021 10:38
@keyonjie keyonjie marked this pull request as draft July 21, 2021 10:39
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 21, 2021

One of the error messages in the checktree failure at https://github.com/thesofproject/sof/pull/4521/checks?check_run_id=3123030117 is :

/home/sof/work/xtensa-cnl-elf/lib/gcc/xtensa-cnl-elf/10.2.0/../../../../xtensa-cnl-elf/bin/ld:/home/sof/work/sof.git/installer-builds/
build_jsl_gcc/icelake.x:433 cannot move location counter backwards (from 00000000be110000 to 00000000be100000)

Same as https://sof-ci.01.org/sofpr/PR4521/build9710/build/jsl_gcc_error.txt

checktree runs a highly concurrent build (it's part of that test) so it's always better to fix errors in other, slower checks first because its logs are very hard to read.

BTW you can compile-test all released platforms in parallel with a single make -C installer/. Or all platforms by editing installers/sample-config.mk

@keyonjie
Copy link
Contributor Author

make -C installer/.

Thanks for information, let me fix it today.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me change to calculate this size locally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"times" seems a bit unclear - count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um, we already have HEAP_COUNTxxxx, so.., "times" means multiplying here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I see it was already used before this patch...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned int count; or n or something simple

@keyonjie keyonjie force-pushed the cavs_common branch 4 times, most recently from ba00319 to eb7edc7 Compare July 23, 2021 06:15
@keyonjie
Copy link
Contributor Author

@marc-hb do you have insight about why qemu boot fail on APL here?

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 23, 2021

@marc-hb do you have insight about why qemu boot fail on APL here?

No, looks like the firmware just failed to boot. It could be for many reasons. It happened to me some time ago and it was because I was reading un-initialized memory. This was working (sometimes?) on hardware but not on QEMU.

This is unfortunately difficult to debug because gdb does not work, on the other hand it's very easy to reproduce, just run the same scripts in .github/workflows that github runs. Then it's very quick to try many variations.

@keyonjie keyonjie force-pushed the cavs_common branch 2 times, most recently from 9efc51e to 323060a Compare July 23, 2021 09:23
@keyonjie
Copy link
Contributor Author

@marc-hb do you have insight about why qemu boot fail on APL here?

No, looks like the firmware just failed to boot. It could be for many reasons. It happened to me some time ago and it was because I was reading un-initialized memory. This was working (sometimes?) on hardware but not on QEMU.

When I just went through the qemu/hw/adsp/dsp/cavs.c, the ADSP_CAVS_1_x_DSP_HP_SRAM_SIZE looks not quite correct there, not sure if it will lead this this boot fail issue.

This is unfortunately difficult to debug because gdb does not work, on the other hand it's very easy to reproduce, just run the same scripts in .github/workflows that github runs. Then it's very quick to try many variations.

Haven't touched the QEMU part for years, @marc-hb do you have a guide about how to build and run it on a non-docker environment?

@keyonjie keyonjie marked this pull request as ready for review July 23, 2021 12:02
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 24, 2021

When I just went through the qemu/hw/adsp/dsp/cavs.c, the ADSP_CAVS_1_x_DSP_HP_SRAM_SIZE looks not quite correct there, not sure if it will lead this this boot fail issue.

The bad thing about our qemu status is that gdb (where a lot of the qemu value is) does not work right now. The good qemu thing that we still have is that we can still change one line or even one character and retest instantly. Very efficient.

Haven't touched the QEMU part for years, @marc-hb do you have a guide about how to build and run it on a non-docker environment?

It had been a long time for me too but I just ran the same scripts than CI and it fortunately worked inside and outside docker the same. No guide was required.

Pay attention to the current directory though; the only "comfort" change I made was thesofproject/qemu#46 Appreciate if you can test that change (it's been a while)

I also tried the sof-5.2 branch in the hope gdb would work but I would not recommend it yet: gdb still does not work, it's more difficult to compile and it's not tested by CI

@keyonjie
Copy link
Contributor Author

@marc-hb I realize the APL qemu boot fail happens with gcc built FW only, I have build failure with gcc for APL, looks updating cross-ng doesn't fix the building issue for me...

/home/keyon/github/keyonjie/sof/src/arch/xtensa/xtos/core-restore.S: Assembler messages:
/home/keyon/github/keyonjie/sof/src/arch/xtensa/xtos/core-restore.S:241: Error: unknown opcode or format name 'wur.ae_cw_sd_no'
/home/keyon/github/keyonjie/sof/src/arch/xtensa/xtos/core-restore.S:241: Error: unknown opcode or format name 'wur.ae_cend0'
/home/keyon/github/keyonjie/sof/src/arch/xtensa/xtos/core-restore.S:241: Error: unknown opcode or format name 'ae_l64.i'
/home/keyon/github/keyonjie/sof/src/arch/xtensa/xtos/core-restore.S:241: Error: unknown opcode or format name 'ae_l64.i'
/home/keyon/github/keyonjie/sof/src/arch/xtensa/xtos/core-restore.S:241: Error: unknown opcode or format name 'ae_l64.i'
/home/keyon/github/keyonjie/sof/src/arch/xtensa/xtos/core-restore.S:241: Error: unknown opcode or format name 'ae_l64.i'
/home/keyon/github/keyonjie/sof/src/arch/xtensa/xtos/core-restore.S:241: Error: unknown opcode or format name 'ae_l64.i'

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 26, 2021

I don't see any connection between this PR breaking the boot on QEMU and your build failure, is there one?

I have been building APL with the crosstool-ng toolchain a million times over the last year and I never had any issue, neither with gcc 8 nor with the more recent gcc 10. Same for github actions and Jenkins.

https://thesofproject.github.io/latest/getting_started/build-guide/build-from-scratch.html#build-sof has recently been tested and updated by both @ujfalusi and myself, so you can start that from scratch. Or use docker?

@keyonjie
Copy link
Contributor Author

Thanks @marc-hb I managed to run this locally without docker, but looks it boots for me without errors:

$ ./xtensa-softmmu/qemu-system-xtensa -cpu broxton -M adsp_bxt -nographic -kernel /home/keyon/hdd/github/keyonjie/sof/build_apl_gcc/src/arch/xtensa/sof-apl.ri -rom /home/keyon/hdd/github/keyonjie/sof/build_apl_gcc/src/arch/xtensa/rom-apl.bin
bridge-io: qemu-bridge-l2-sram-mem fd 9 region 1 at 0x7fb1189d5000 allocated 2097152 bytes
bridge-io: qemu-bridge-hp-sram-mem fd 10 region 2 at 0x7fb118712000 allocated 2097152 bytes
bridge-io: qemu-bridge-lp-sram-mem fd 11 region 3 at 0x7fb120234000 allocated 131072 bytes
bridge-io: qemu-bridge-imr-mem fd 12 region 4 at 0x7fb118692000 allocated 524288 bytes
bridge-io: qemu-bridge-rom-mem fd 13 region 5 at 0x7fb120232000 allocated 8192 bytes
bridge-io: qemu-bridge-cmd-io fd 14 region 6 at 0x7fb120280000 allocated 16 bytes
bridge-io: qemu-bridge-res-io fd 15 region 7 at 0x7fb120231000 allocated 16 bytes
bridge-io: qemu-bridge-ipc-dsp-io fd 16 region 8 at 0x7fb120230000 allocated 32 bytes
bridge-io: qemu-bridge-idc0-io fd 17 region 9 at 0x7fb12022f000 allocated 128 bytes
bridge-io: qemu-bridge-idc1-io fd 18 region 10 at 0x7fb12022e000 allocated 128 bytes
bridge-io: qemu-bridge-hostwin0-io fd 19 region 11 at 0x7fb12022d000 allocated 8 bytes
bridge-io: qemu-bridge-hostwin1-io fd 20 region 12 at 0x7fb118691000 allocated 8 bytes
bridge-io: qemu-bridge-hostwin2-io fd 21 region 13 at 0x7fb118690000 allocated 8 bytes
bridge-io: qemu-bridge-hostwin3-io fd 22 region 14 at 0x7fb11868f000 allocated 8 bytes
bridge-io: qemu-bridge-irq-io fd 23 region 15 at 0x7fb11868e000 allocated 512 bytes
bridge-io: qemu-bridge-timer-io fd 24 region 16 at 0x7fb11868d000 allocated 512 bytes
bridge-io: qemu-bridge-mn-io fd 25 region 17 at 0x7fb11868c000 allocated 512 bytes
bridge-io: qemu-bridge-l2-io fd 26 region 18 at 0x7fb11868b000 allocated 64 bytes
bridge-io: qemu-bridge-ssp0-io fd 27 region 19 at 0x7fb118689000 allocated 8192 bytes
bridge-io: qemu-bridge-ssp1-io fd 28 region 20 at 0x7fb118687000 allocated 8192 bytes
bridge-io: qemu-bridge-ssp2-io fd 29 region 21 at 0x7fb118685000 allocated 8192 bytes
bridge-io: qemu-bridge-ssp3-io fd 30 region 22 at 0x7fb118683000 allocated 8192 bytes
bridge-io: qemu-bridge-ssp4-io fd 31 region 23 at 0x7fb118681000 allocated 8192 bytes
bridge-io: qemu-bridge-ssp5-io fd 32 region 24 at 0x7fb11867f000 allocated 8192 bytes
bridge-io: qemu-bridge-dmac0-lp-io fd 33 region 25 at 0x7fb11867e000 allocated 128 bytes
bridge-io: qemu-bridge-dmac1-hp-io fd 34 region 26 at 0x7fb11867d000 allocated 2048 bytes
bridge-io: qemu-bridge-shim-io fd 35 region 27 at 0x7fb11867c000 allocated 256 bytes
bridge-io: qemu-bridge-gtw-lout-io fd 36 region 28 at 0x7fb11867b000 allocated 448 bytes
bridge-io: qemu-bridge-gtw-lin-io fd 37 region 29 at 0x7fb11867a000 allocated 448 bytes
bridge-io: qemu-bridge-gtw-hout-io fd 38 region 30 at 0x7fb118679000 allocated 896 bytes
bridge-io: qemu-bridge-gtw-hin-io fd 39 region 31 at 0x7fb118678000 allocated 896 bytes
bridge-io-mq: added /qemu-io-parent-bxt
bridge-io-mq: added /qemu-io-child-bxt
now loading:
 kernel /home/keyon/hdd/github/keyonjie/sof/build_apl_gcc/src/arch/xtensa/sof-apl.ri
 ROM /home/keyon/hdd/github/keyonjie/sof/build_apl_gcc/src/arch/xtensa/rom-apl.bin
bridge-io: 0 messages are currently on child queue.
Header $AM1 found at offset 0x2300 bytes
ROM loader: copy 54 kernel pages to IMR
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) 

@keyonjie keyonjie marked this pull request as draft July 26, 2021 06:41
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 26, 2021

Looks like you're not running the test.

@keyonjie
Copy link
Contributor Author

keyonjie commented Sep 7, 2021

Most of this PR is good to go in now, except the Zephyr memory size patch and the uncached buffer patch. It may be worthwhile to split the PR into smaller chunks wich can be merged whilst we work through the opens.

Just did an debugging and read out the Zephyr memory size for different platforms, pushed the updates.

@keyonjie
Copy link
Contributor Author

keyonjie commented Sep 7, 2021

@lgirdwood the Kconfig item added also.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Zephyr heap implementation in wrapper.c is very basic, so I think we can proceed with a fairly simple approach there and focus this review on the XTOS aspects.

zephyr/wrapper.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about for IMX (the above #if-else statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it intact for the IMX case here?

zephyr/wrapper.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a shortterm workaround (e.g. Zephyr's libc already has similar code to figure out the heap size:
zephyr/lib/libc/newlib/libc-hooks.c ), so I'd prefer a very simple definition and add a TODO comment to clearly mark this is a stopgap.

zephyr/wrapper.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove these comments and add one "/* TODO: heap size should be set based on linker output */" instead at top.

zephyr/wrapper.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "CONFIG_HP_MEMORY_BANKS >= 16"

zephyr/wrapper.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"CONFIG_HP_MEMORY_BANKS < 16"

zephyr/wrapper.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

">= 30"

zephyr/wrapper.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is problematic. We can get size of the heap from linker (and hardcode it in shortterm), but the ratio of the heap that should be allocated to coherent-versus-noncoherent usage is SOF specific information and that we'd have to get from SOF. We should assume "half is cacheable" here, but rather this should come from some common definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i we actually share the same zone for cached/uncached requests after this PR on the XTOS version, this makes us more flexible and the portion can come from runtime not the built time. For the Zephyr version, if we don't plan to use 'ENABLE_CACHED_HEAP' anymore, we actually don't need to care the portion here.

All the change here I made just an example for unblocking the Zephyr build, I rely on you and @lyakh to provide a specific commit or incremental change on that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack @keyonjie . So yeah, then I think we should follow suite with Zephyr and use a single heap as well and serve both cached and non-cached allocations from the same heap area. No need to block this PR to wait for the Zephyr work to complete.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments in the Zephyr part needed so everyone knows this this is temp, but should be good to merge tomorrow.
@kv2019i any reason why the Zephyr heaps are partitioned in cache and uncached ? Can we have one type and do the type conversion in a local wrapper (since its all the same PHY memory) ?

@kv2019i
Copy link
Collaborator

kv2019i commented Sep 7, 2021

@lgirdwood wrote:

@kv2019i any reason why the Zephyr heaps are partitioned in cache and uncached ? Can we have one type and do the type conversion in a local wrapper (since its all the same PHY memory) ?

This was done only to ensure cache coherence for the heap metadata as the sys_heap.h implementation stores metadata in-between the allocated chunks, and multiple cores may access the metadata at any given time.

OTOH, our current cached heap doesn't really work yet as we still observe corruption of metadata (albeit only in long stress tests, but still something is wrong). Once we get a standalone cached heap to work, we could take another look at whether same coherence protocol (for the metadata) could be implemented within context of a single heap. I do agree not doing a static split between the two, would be a much better design.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 7, 2021

More comments in the Zephyr part needed so everyone knows this this is temp, but should be good to merge tomorrow.

Zephyr does not even build right now: zephyrproject-rtos/zephyr#38349

@keyonjie
Copy link
Contributor Author

keyonjie commented Sep 8, 2021

@lgirdwood the interesting thing is that if we are hitting kinds of CI failure with keeping uses cached buffers, please see the latest CI results.

The allocator will return failure if the request size is too big, remove
the unsuitable check here.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
246ea64 config: skl: change image_size to the real SRAM size
9c9e07c config: kbl: change image_size to the real SRAM size
44b37d1 config: sue: change image_size to the real SRAM size
580e4d6 config: tgl-cavs: change image_size to the real SRAM size

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
@lgirdwood
Copy link
Member

@keyonjie some red on internal CI, can you split the PR so we can bisect the problematic patch. Some of the updates are good to go now.

We have change the size of runtime_shared zone to be decided at link
stage, here hardcode HEAP_RUNTIME_SHARED_SIZE to unblock the Zephyr
building.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
@keyonjie
Copy link
Contributor Author

keyonjie commented Sep 8, 2021

@lgirdwood planing to split it to 3, the 1st one here:
#4730
the 2nd one here: #4664

Let's merge the 1st and 2nd first if the CI result good there.

@keyonjie keyonjie changed the title cavs: make full use of HP SRAM banks Heap refinement Part23 -- make full use of HP SRAM banks Sep 8, 2021
@keyonjie keyonjie changed the title Heap refinement Part23 -- make full use of HP SRAM banks Heap refinement Part3 -- make full use of HP SRAM banks Sep 8, 2021
This is a refinement of the heap memory management:

1. refine and simplify the allocator, call rzalloc() and rbolloc() with
flag SOF_MEM_FLAG_COHERENT for uncached/coherent address usage.

2. remove the SYS_SHARED and RUNTIME_SHARED zones, merge the usage of
them to the corresponding SYS_RUNTIME and BUFFER zones.

3. make full use of the SRAM, with making the size of the buffer zone
configured at linking stage.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
Align to the new heap memory map and allocator, smaller SYSTEM and
RUNTIME zones are used on apollolake now.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
The mux cd could be bigger than 1KB, change to use rballoc() for
allocating the buffer, to try to ensure the allocation is success.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
After zones are refined, need an alignment to the latest cavs version.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
To rule out buffer coherency issue, it is useful to add a debug option
with which selected only coherent/uncached buffer will be used.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
The lli structs are already allocated them from the coherent heap buffer
so no need invalidation and write back anymore.

Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
#define HEAP_SIZE 0x80000
#elif (CONFIG_HP_MEMORY_BANKS < 45)
/* e.g. TGL-H */
#define HEAP_SIZE 0x100000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify how this is handled in rimage, your changes only deal with 'tgl' and there's no specific memory size for TGL-H.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify how this is handled in rimage, your changes only deal with 'tgl' and there's no specific memory size for TGL-H.

The memory size is configured in the Kconfig, see it here:

CONFIG_HP_MEMORY_BANKS=30

For rimage, we need to specify image_size according to the memory size, see it here:
https://github.com/thesofproject/rimage/blob/9a26e4558094f19f3f7becd89eb9e8a9a9dd82b9/config/tgl-h.toml#L6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you didn't modify the tgl-h.toml file in this PR, did you? see 1b0757e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you didn't modify the tgl-h.toml file in this PR, did you? see 1b0757e

That's because the tgl-h.toml is already updated before my PR, the rimage is already updated to 9a26e45 in the sof upstream repo:

9a26e45 config: apl: change image_size to the real SRAM size
8a2ea00 config: cnl: change image_size to the real SRAM size
1de9ccc config: jsl: change image_size to the real SRAM size
f52a078 config: icl: change image_size to the real SRAM size
8073ea3 config: tgl-h: change image_size to the real SRAM size
9e50a02 config: tgl: change image_size to the real SRAM size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment?

#define HEAP_SYS_RUNTIME_0_BASE 0xBE120000
#define SOF_CORE_S_START 0xBE140000
#define HEAP_RUNTIME_BASE 0xBE180000
#define HEAP_BUFFER_BASE 0xBE1C0000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how anyone could review such changes without any comments on how this is inferred?


cd = rzalloc(SOF_MEM_ZONE_RUNTIME, 0, SOF_MEM_CAPS_RAM,
/* allocate quite a big buffer, use rballoc to make sure it will succeed */
cd = rballoc(0, SOF_MEM_CAPS_RAM,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we special-case a component? something's fishy here. why not the mixer as well then?

Copy link
Contributor Author

@keyonjie keyonjie Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we special-case a component? something's fishy here. why not the mixer as well then?

The size of 'cd' here is > 1KB, we have very limited runtime blocks on APL, so any ask for allocation size >= 1KB are suggested to use the rballoc() and allocation will happen on the BUFFER zone.

The mixer 'cd' size is very small, using rzalloc() to allocate e.g. a 64Bytes buffer could help us to save memory usage.

And yes, maybe we should make the allocator more smart in advance, and then the callers can use an unified helper and no need to care of details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our allocator is simple and showing its limitations. The Zephyr allocator is far better.

@keyonjie
Copy link
Contributor Author

keyonjie commented Sep 9, 2021

@lgirdwood I am splitting the biggest commit 'alloc: heap memory refinement ' to make life easier, will submit some low hanging fruits first.

@keyonjie
Copy link
Contributor Author

keyonjie commented Sep 9, 2021

let me close this one and go with smaller PRs.

@keyonjie keyonjie closed this Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.