Skip to content

Conversation

@srajko
Copy link
Collaborator

@srajko srajko commented May 2, 2016

No description provided.

@srajko srajko force-pushed the thread-safety-threadpool branch from 1890f6c to 46938f2 Compare May 2, 2016 19:27
@joshaber
Copy link
Collaborator

joshaber commented May 2, 2016

Innnnnnnnnteresting.

@srajko srajko force-pushed the thread-safety-threadpool branch from 46938f2 to 9b61269 Compare May 2, 2016 19:58
@srajko
Copy link
Collaborator Author

srajko commented May 2, 2016

@joshaber yep, the time has come to finalize this :-)

With the recent libgit2 upgrade, libgit2 no longer locks the index internally. That makes it more important to turn on nodegit's thread safety, but the locking increases the burden on node's threadpool and can lead to long delays and deadlocks like we've discussed.

Hopefully, this PR will avoid those issues and make thread-safe nodegit run better.

@srajko srajko force-pushed the thread-safety-threadpool branch from 9b61269 to 09b9f81 Compare May 2, 2016 20:34
@saper
Copy link
Collaborator

saper commented May 2, 2016

wow, that's very interesting.

@srajko srajko force-pushed the thread-safety-threadpool branch from 09b9f81 to f9e0459 Compare May 2, 2016 23:53
@johnhaley81
Copy link
Collaborator

Why this is needed:

So with running async libgit2 threads with thread-safety we're enqueuing them with the internal libuv thread pool which in node is hardcoded to 4 (IIRC). This hits us hard because we also share a thread with node so that leaves us with 3 threads to pull from.

That sucks because say you're running a libgit2 function that has multiple callbacks. So we have 1 thread for node, 1 thread for the original call for libgit2, a 1 thread for each callback. Let's assume that those callbacks are also going to go run libgit2 code. 💥 thread lock.

So as a first step this is 👍 but we talked offline about this and think that we need a better system to use a vector and grow the thread pool if needed and then shrink down when not.

@srajko srajko force-pushed the thread-safety-threadpool branch 2 times, most recently from d10dcee to ed61ea0 Compare May 3, 2016 14:45
@srajko
Copy link
Collaborator Author

srajko commented May 3, 2016

@nodegit/owners OK folks, I made most of the updates to the code that I wanted to - if anyone has a little time to spare I would love to get feedback.

The main parts are:

  • ThreadPool - a custom thread pool implementation on top of libuv
  • LoopQueuer - a utility that lets you schedule a callback on a libuv event loop (uv_default_loop in this case)
  • AsyncLibgit2QueueWorker - a replacement for Nan::AsyncQueueWorker, which uses ThreadPool to execute the async part, and LoopQueuer to execute the completion part back in the default libuv thread.

As mentioned eariler, the reason for this PR is to get our async calls off libuv's threadpool, and avoid slowdowns and deadlocks resulting from needing more threads than available. This issue was anticipated by @saper in #827 (comment), we have seen it in GitKraken when we tried turning on NodeGit's thread-safety locking, and IIRC @joshaber reported seeing it in Atom as well. The PR currently uses 10 dedicated threads for async libgit2 calls - it is not a well thought-out number but it's >4 and dedicated :-)

I've considered exposing some statistics to javascript, for example the number of async handles opened by LoopQueuer to make sure that number isn't getting high, and the current/max number of threads used by ThreadPool to see if 10 is adequate. If anyone else thinks that would be useful I'll add it in.

(the CI is failing, but it looks like it's problems with AppVeyor and Travis - it was green yesterday before the final code cleanup)

@johnhaley81
Copy link
Collaborator

CI died due to network issues. I'm re-running the tests.

@tbranyen
Copy link
Member

tbranyen commented May 3, 2016

Which sucks. Every day I wake up and say to myself, "today I'll make the unit tests work offline..."

@johnhaley81
Copy link
Collaborator

Me too @tbranyen. Me too :(

@johnhaley81
Copy link
Collaborator

Testing this in GK right now.

@srajko
Copy link
Collaborator Author

srajko commented May 4, 2016

We did run into some problems during testing in GitKraken. It worked well for the most part, but on windows we were seeing frequent app restarts. Commenting out calls to uv_ref and uv_unref resolved the problem - so these calls (they tell node when the process should stay alive, and when it can terminate) may have been somehow causing electron to stop the process and reload.

@joshaber - @johnhaley81 suggested I ping you about this - any ideas?

@joshaber
Copy link
Collaborator

joshaber commented May 4, 2016

Unfortunately I don't know much about Electron's internals. 😞 You could try opening an issue on the repo.

@srajko srajko force-pushed the thread-safety-threadpool branch from ed61ea0 to 41e39b4 Compare May 5, 2016 15:59
@srajko srajko force-pushed the thread-safety-threadpool branch from 41e39b4 to 21f65d2 Compare May 5, 2016 20:56
@srajko
Copy link
Collaborator Author

srajko commented May 9, 2016

The problem ended up being on my end - uv_ref and uv_unref are not thread-safe, not only in the sense that you shouldn't concurrently ref/unref the same handle, they are also unsafe for different handles that are tied to the same event loop (which was the case here).

I took LoopQueuer out of the PR, and now the ThreadPool schedules both the work on the threadpool, and the completion on the event loop. uv_ref and uv_unref now happen all on the same thread. This seems like a cleaner solution anyway.

@srajko srajko changed the title [WIP] Run async libgit2 calls on custom threadpool Run async libgit2 calls on custom threadpool May 9, 2016
@johnhaley81
Copy link
Collaborator

We are running this build in production over in GitKraken land and things look 💯 so barring any feedback I'm going to go ahead an merge this in.

Awesome job @srajko 👍

@tbranyen
Copy link
Member

tbranyen commented May 9, 2016

I'm not really qualified to review this PR, so I'll just 🎉

@johnhaley81
Copy link
Collaborator

It's alright @tbranyen, we love you anyways. @joshaber @saper @implausible?

@joshaber
Copy link
Collaborator

Looks reasonable to me 👍 Thanks @srajko!

@johnhaley81 johnhaley81 merged commit 7c2f7ef into nodegit:master May 10, 2016
@johnhaley81 johnhaley81 deleted the thread-safety-threadpool branch May 10, 2016 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants