-
Notifications
You must be signed in to change notification settings - Fork 349
Fix chain dma task handling issue and re-enable it. #7073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsarha, 'appears to fix' implies that we don't know the real issue?
Or we must stop before freeing unconditionally as this is the correct way?
@lyakh, @kv2019i, can you clarify on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had several locks and prints in places, when debugging this on Friday. Based on those prints it looked like the crash is caused by the task being executed after its been freed. I am pretty sure this a correct fix. Before the fix the multiple-paise-resume crashed in a matter of seconds, after it was started. To pass the test it needs run something like 30mins without a glitch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi yes, I think it's needed
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would something like this can be added instead for a generic fix?
probably the ret check is not needed? I'm not sure what is the return if the task is not scheduled, my guess would be 0, but if not, then the ret check is not good.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That, or add clear documentation to schedule_task_free() that one should not call free to a task that may still be scheduled. E.g. keep the API symmetric schedule_task_init() and schedule_task_free(), vs. schedule_task() and schedule_task_cancel().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i do you mean waiting on a semaphore in
zephyr_ll_task_free()? Well, that's just one particular case. I think that so far we have what @jsarha described as a symmetric API. It could in principle be changed , I guess, but we'd need to verify all the cases - XTOS, Zephyr, LL, EDF,...Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I try to think how the task can be scheduled after being freed, I can not put my finger on it. But since I have seen it, I am pretty sure there can be some unfortunate serializations zephyr_ll_run() and zephyr_ll_task_free() where this could happen. But indeed it starts to look like there is a bug in zephyr_ll.c.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the zephyr_ll_lock() = zephyr arch_irq_lock() and zephyr_ll_unlock() = zephyr arch_irq_unlock(), do anything about the assignments actually being flushed to the memory? Should they? It looks to me that their implementation on xtensa reduce to a single instruction:
Even if the threads run on the same core, the functions trust the values are immediately readable from another thread, before the function exits. How do we know the values are actually written to memory instead of them being still held in the registers, if there is no explicit memory barriers or even volatile qualifiers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why I came to the conclusion, that
schedule_task_cancel()has to be called beforeschedule_task_free()with Zephyr LL scheduler. Looking atzephyr_ll_task_free()now it should handle freeing tasks in any state, including running. Note, that as described there, it must only be called on the same core as where the task is running and only from a thread context. Is this all the case? Have we decoded the crash? @jsarhaUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The crash always happens at rsil instruction when taking the lock (= forbidding interrupts?) in zephyr_ll_run() after do_task_run() call. At which point - according to my debug prints - the task has already been freed. After thinking different serializations of the functions for some time, I can not think any other reason for that to happen than the "locking" not working as it should (e.g. not acting as explicit memory barrier). Maybe a more correct way to fix this particular problem would be not to init() and free() the task all the time, but just to schedule() and cancel() it. But it still appears that there is something fishy how the zephyr_ll works.
So the crash always happens here:
sof/src/schedule/zephyr_ll.c
Line 208 in 7fdc623