-
Notifications
You must be signed in to change notification settings - Fork 697
UNSAFE Temporary workaround for LFS checkout performance regression #1883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNSAFE Temporary workaround for LFS checkout performance regression #1883
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexaXs This is not a good move. While it makes sense that we want to fix the regression - this cannot be a permanent solution as it throws lock safety out the window. Basically this allows nodegit to enter a state where the callback calling thread may perform work with a "locked" object, but the lock itself has been released by the callback runner. This is incredibly unsafe and was one of the aims that I intended to fix when updating the library in this way.
This may be a temporary workaround however, we have no idea what the consequences of this change really are other than "it seemed to work before". There definitely existed segfaults around LFS before, and it's very possible that this fix improved stability.
The appropriate change for fixing LFS is not to remove this patch, but to integrate nodegit-lfs into nodegit directly and with a managed solution for working with locks. Frankly, this fixes a performance regression to reintroduce a safety regression... We should aim instead to fix 2 performance regressions without introducing any safety regressions. Nodegit-lfs is poorly designed right now, because it does not use a streaming interface, it loads the entire blob into memory and then sends it to lfs for processing. If we integrated nodegit-lfs directly into nodegit, we could theoretically switch the implementation to:
- streaming interface
- no need for callbacks to JS simply for streaming data
- only using callbacks for authentication
I strongly suggest taking this approach for fixing LFS and looking at a longer term approach that will improve the value of the library. That this is an issue at all should provide enough friction with the team and customers to provide the support you need to build this out at Axosoft.
That all said, if the timeline for fixing lfs properly isn't manageable, we can discuss merging this; however, this PR and associated commits need to be updated to comment out the code and properly document that this was removed as a temporary workaround and not as a "permanent fix". We should label this as an unsafe workaround before it is merged, and we should not remove the code that works as intended.
|
@implausible, first of all than you so much for reviewing this PR, and for the tremendous work you did to rewrite the threadpool. Also thanks a lot for your comments on the approach to fix LFS properly. Obviously I hadn't understood all the flow involved in the threadpool with thread-safety, so apologies for the unsafe changes in the PR. I've checked previous PRs related with the threadpool and thread-safety to understand better the present code:
I'd appreciate if you could confirm what I understand about the present code:
If my understanding on those points are correct, it'd be great if you could review this change to the PR consisting on commenting out the temporaryUnlock in the first point above. Let me know if my thoughts are correct concerning this possible update to the PR:
Unless there's something I'm not seeing, by commenting out the temporaryUnlock, the objects initially locked by the async Worker will remain locked until the COMPLETED event is received (when the Worker finishes with the async execution), keeping NodeGit thread-safe. As I see it, with this change, although we open the possibility to suffer from deadlocks in the future, with the present code NodeGit remains thread-safe and there is no possibility of deadlocks. If you think these thoughts are correct and commenting out the temporaryUnlock is a valid temporal workaround, it'd be great if you could merge this PR until LFS is fixed properly. In that case I'd update this PR this way:
Let me know please if everything makes sense. And happy 2022 by the way! 😄 |
8af36d5 to
b4ccac5
Compare
|
@implausible, I've modified the description and the commits of this PR to see if they could work as an
We'll be working on the integration of LFS into nodegit, that will revert this |
|
I think that generally covers my major concerns. I would be open to this workaround since it is targeted and addresses real customer impact. I am excited to see LFS become nodegit native in the future! |
|
Would you kindly add a test in https://github.com/nodegit/nodegit/blob/master/test/tests/worker.js to demonstrate that this code is safe during shutdowns? |
- ThreadPoolImpl doesn't need to keep a pointer of context. - Methods RunJSThreadCallbacksFromOrchestrator not used.
We want to test two scenarios: - When libgit2 spawns threads to do the work (when doing a checkout). - When libigt2 leverages a single thread to do the work (for example when working with submodules). In each scenario, we'll run synchronous work inside the callbacks, where no locking is applied, so they should succeed. We'll also run asynchronous work inside the callbacks that lock the same objects already locked. These tests should be able to run by temporary unlocking those objects until the callback ends.
This is a temporary workaround in order to avoid the lost of performance with LFS checkout. The change is limited to the processing of callbacks from Workers that leverage threaded libgit2 functions. Basically what it does is allowing the callbacks from executorEventsQueue to be queued in jsThreadCallbackQueue without waiting for the current one to end. It is unsafe because with threaded libgit2 functions there is a potential risk of deadlock if the callbacks need to lock an object. This commit will be reverted when nodegit-lfs is integrated into nodegit.
b4ccac5 to
7b248d9
Compare
Checkout leverages libgit2 threads and when applying filters it runs JS callbacks. These tests check that when running checkout on a worker thread and this is terminated, it exists gracefully without memory leaks.
Thank you so much @implausible, and yes, I am excited too about the LFS integration! Sorry for the delay in pushing the tests. I`ve added a couple of tests, the first one terminates a worker thread that is running checkouts in a loop where a git filter is used, to demonstrate that it is terminated gracefully. The second one checks that the objects involved in multiple checkouts are tracked and will be freed when the worker thread is terminated, preventing memory leaks. |
Keep tests
After a Threadpool rewrite for context-awareness, LFS checkouts started suffering long delays. We found the reason is that libgit2 callbacks that were being run in parallel started being run in sequence. But this is actually the safest way to deal with the callbacks, preventing deadlocks while being executed, and keeping the non-callback work thread-safe.
The changes here will just be an
UNSAFEtemporary workaround until LFS is properly integrated into NodeGit. The changes will be limited to the processing of callbacks from Workers that leverage threaded libgit2 functions (at the moment, only checkout functionality), and they basically allow the callbacks from the current Worker to be queued to the main JS thread without waiting for the current callback to end.It is
UNSAFEbecause with threaded libgit2 functions there will be a potential risk of deadlock if the callbacks need to lock an object that is already locked by the current Worker.The
Unsafecommit in this PR will be reverted once LFS is integrated into nodegit.