Skip to content
Discussion options

You must be logged in to vote

Hi @claudiubalogh, thanks for reaching out! Pagefault bugs after long runs can be difficult to track down for sure. The log you provided didn't seem to reveal anything too specific unfortunately. We suggest trying the following "tricks" to get more information that hopefully reveals some more hints.

  • Set HSAKMT_DEBUG_LEVEL to 3~5: this reveals a trace in HSA, which is the ROCm runtime interfacing OpenCL and lower-level drivers. Higher level will reveal more information but will also generate more log. Given that your workload runs for a long time, it would probably be the best to start with a lower value. Once you get an idea what was happening around when the page fault occurs, you can …

Replies: 1 comment 16 replies

Comment options

You must be logged in to vote
16 replies
@claudiubalogh
Comment options

@claudiubalogh
Comment options

@claudiubalogh
Comment options

@tcgu-amd
Comment options

@claudiubalogh
Comment options

Answer selected by tcgu-amd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
2 participants