Skip to content

Conversation

@willyborn
Copy link
Contributor

@willyborn willyborn commented Oct 13, 2021

Disk caching continued loading expired JIT code, resulting in exceptions. Clearing the disk cached, hided the problem.
For interface changes, an exception was generated.
For all other changes, the old code (initial cached) was loaded and executed.

Description

JIT code uses the hashing of the function name (based on parameters for JIT) for fast retrieval of the corresponding kernel. The same hashing was also used to save the code on disk. When JIT code was updated, the same function name still applies, resulting in an mismatch with a previous saved/cached version.
Static code uses the hashing based on the full code including options.

Now for JIT code, the hashing for disk caching is based on the full code including options (same as static code). Once the kernel is saved into memory everything remains the same:
. JIT code remains using the hashing based on function name (no code generation)
. static code remains using the hashing based on full code + options (hashing is also static)
Result, JIT is now reacting exactly the same as the static code and the correct kernel is always loaded.

  • This is a bug fix.
  • No new functions,
  • This PR can be back-ported to previous versions.

Changes to Users

  • No changes to the end-user. Clearing of the disk cache is no longer needed to eliminate the exception.

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass

@9prady9
Copy link
Member

9prady9 commented Oct 13, 2021

👍🏾 It is awesome that you finally found out the reason for the JIT kernel cache issue. I am just concerned that it increases the startup cost slightly in the case of loading from disk. Did you happen to time by how much the loading time increased with this change ?

@umar456 Is there a way we can actually fix the kernel name generation of JIT ? I was under the impression that JIT kernel names encoded the some form of kernel operations info.

@willyborn
Copy link
Contributor Author

willyborn commented Oct 13, 2021 via email

@willyborn
Copy link
Contributor Author

Regarding the JIT kernel names.
It does contain a description of the dynamic calling parameters, incl types etc.
It does not include, the real code or a description of the always present parameters.

As result, fixes in the code, without dynamic interface changes, do result in the same JIT kernel name.
For my opinion, I would keep this, because each time generating the code of those kernels is really time consuming and are static inside one run. (not between runs/disk caches, since code could have changed).

@umar456
Copy link
Member

umar456 commented Oct 14, 2021

@willyborn I couldn't see the times in the log you posted. I assume its pretty small based on your comments. I can't imagine this hashing would increase the times significantly. I am curious if we can't achieve the same thing by including the AF_REVISION when storing the names. This will allow us to refresh the cache for each commit and avoid issues like this. Can you think of a reason why this would fail?

@willyborn
Copy link
Contributor Author

willyborn commented Oct 14, 2021 via email

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. This should alleviate a lot of confusion when developing kernels. I was able to test it on my local system and everything was running correctly and working as expected. Thank you for your contribution.

vector<string> sources_str;
for (auto s : sources) { sources_str.push_back({s.ptr, s.length}); }
currModule = compileModule(to_string(moduleKey), sources_str,
for (const auto& s : sources) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@umar456 umar456 merged commit 1d03d07 into arrayfire:master Oct 15, 2021
@willyborn willyborn deleted the JIT-DiskCaching branch September 29, 2022 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants