Skip to content

Conversation

@cujomalainey
Copy link
Contributor

For some reason the memory banks can still be inaccessible despite registers saying they are ready. If they are accessed prematurely this will result in a crash, typically when memory is being initialized. Adding this delay allows a device to pass over 1k reboots without a crash that previously could pass 5 cycles.

Signed-off-by: Curtis Malainey cujomalainey@chromium.org

For some reason the memory banks can still be inaccessible despite
registers saying they are ready. If they are accessed prematurely this
will result in a crash, typically when memory is being initialized.
Adding this delay allows a device to pass over 1k reboots without a
crash that previously could pass 5 cycles.

Signed-off-by: Curtis Malainey <cujomalainey@chromium.org>
@cujomalainey cujomalainey added P1 Blocker bugs or important features ADL Applies to Alder Lake platform labels Feb 7, 2023
@cujomalainey
Copy link
Contributor Author

Note this was tested on rpl-001 but the diff between RPL-001 and main appears to be unrelated to the code flow for this issue.

@cujomalainey
Copy link
Contributor Author

Note this regression is likely caused by the fact that #4466 (also on adl-004-drop-stable) managed to completely skip landing on main.

We can pick that up instead if preferred, but I figured an exit delay is more deterministic since we know the registers signal availability before the bank is actually ready.

@andyross
Copy link
Contributor

andyross commented Feb 7, 2023

Drive by zephyrization comment: AFAICT the equivalent driver[1] in a zephyr build does seem to have (what I think is!) the equivalent delay: https://github.com/zephyrproject-rtos/zephyr/blob/main/soc/xtensa/intel_adsp/cavs/sram.c#L44

Might be a good idea to audit to be sure. Are there known rules we could enforce more simply? Maybe something like "Must wait 3us after a write to the LDOCTL register"? Might be easier than sprinkling delay calls everywhere, and PM isn't likely to be a microsecond-scale performance path.

[1] For the rather different Zephyr PM framework, but the low level code looks more or less directly translated.

@cujomalainey
Copy link
Contributor Author

Drive by zephyrization comment: AFAICT the equivalent driver[1] in a zephyr build does seem to have (what I think is!) the equivalent delay: https://github.com/zephyrproject-rtos/zephyr/blob/main/soc/xtensa/intel_adsp/cavs/sram.c#L44

Might be a good idea to audit to be sure. Are there known rules we could enforce more simply? Maybe something like "Must wait 3us after a write to the LDOCTL register"? Might be easier than sprinkling delay calls everywhere, and PM isn't likely to be a microsecond-scale performance path.

[1] For the rather different Zephyr PM framework, but the low level code looks more or less directly translated.

Actually I think its missing unfortunately. This new delay is the last step so it would be after this line

@mwasko
Copy link
Contributor

mwasko commented Feb 8, 2023

Note this regression is likely caused by the fact that #4466 (also on adl-004-drop-stable) managed to completely skip landing on main.

It looks like the PR was left hanging and eventually closed when @slawblauciak left the SOF team.

We can pick that up instead if preferred, but I figured an exit delay is more deterministic since we know the registers signal availability before the bank is actually ready.

@pblaszko did some research and it looks like this behavior might be related to a known issue. If we confirm that then the recommended fix will be to read EBB status twice and if different values is received then read again.

@thesofproject thesofproject deleted a comment from pblaszko Feb 8, 2023
@cujomalainey
Copy link
Contributor Author

We can pick that up instead if preferred, but I figured an exit delay is more deterministic since we know the registers signal availability before the bank is actually ready.

@pblaszko did some research and it looks like this behavior might be related to a known issue. If we confirm that then the recommended fix will be to read EBB status twice and if different values is received then read again.

Please make this a priority, we currently have devices in the hands of dogfooders that are unstable and are running out of time on our release calendar

@mbukowsk
Copy link
Contributor

mbukowsk commented Feb 9, 2023

I have already created PR with recommended change
#7065

@cujomalainey cujomalainey deleted the memory branch February 28, 2023 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ADL Applies to Alder Lake platform P1 Blocker bugs or important features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants