-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix: race condition in tr_torrentStat()
#7948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ee9620e to
641458c
Compare
|
Just to make sure I'm understanding your analysis right -- transmission-qt never calls tr_torrentStat() explicitly. When you say it just gets the lock once and then performs everything in a batch before releasing, are you referring to the lock call in |
| auto const lock = tr_sessionLock(self.sessionHandle); | ||
| for (Torrent* torrent in self.fTorrents) | ||
| { | ||
| [torrent update]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't inspected the code in depth, so this is more of a question than a change request -- how much risk is there in calling [torrent update] inside of the lock? it looks like update does enqueue a notification. Is that dispatched immediately, or later when the event loop is idle? (ie after we've released the lock)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we'll need one of the macOS contributors to help us confirm this. I'm not joking when I say I've got obj-c++ literacy of a kneecap. 🥶
Tried my best to read the docs for NSNotificationQueue. Sounds like there is an event loop similar to the core's libevent loop, and the notification will be dispatched after this updateUI invocation?
Whereas a notification center distributes notifications when posted, notifications placed into the queue can be delayed until the end of the current pass through the run loop or until the run loop is idle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notification enqueuing here is very safe: it's NSNotificationCoalescingOnName which means that if you send 1 thousand of them at the same time, it will only produce a small number of notifications. I don't know exactly how many, but I suspect around 30/sec or something adjusted for the refresh rate of the display.
In other words, you quickly enqueue 1000 times the same notification within 0.1 sec, it will only be posted 1-3 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's helpful, thanks!
Do those notifications that get posted execute immediately (ie while we still have the mutex lock) or do they get invoked later when the event loop is idle?
To avoid possible deadlocks, we should minimize the amount of code that's invoked inside the lock. That's my concern and why I'm asking about the notifications, since that's the only place I see that might be risky if the notifications are executed inside the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's posted async, potentially during the lifetime of the lock.
Exactly. |
|
The code looks reasonable to me. I'd like one of the @transmission/contributors to test this & see how it affects the jank |
macosx/Controller.mm
Outdated
| static CGFloat const kBottomBarHeight = 24.0; | ||
|
|
||
| static NSTimeInterval const kUpdateUISeconds = 1.0; | ||
| static NSTimeInterval const kUpdateUISeconds = 3.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep it every 1 second.
The sensitivity to this visual rhythm is important for many people. Slower would look frozen/broken.
Think of the progress bars for anything: if it stops moving for more than a second, you get anxious, and lose time figuring out if some problem happened or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but does 1 second cause UI stuttering? And more importantly, has the crashes stopped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped the commit for 1s => 3s, see how it goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether we're at 1s or 3s, the big question is @Coeur when you test this, does this stop the crashes w/o adding jank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try more thoroughly during this weekend.
It takes a few hours to reproduce a crash on main branch, so in order to assert that this fix doesn't crash, I need many hours.
I'm now on a very powerful computer with only a dozen torrents, so I won't be able to assert jankiness. I don't know if @nevack @DevilDimon @sweetppro or @livings124 have some older hardware or a tons of torrents to test with harsher conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also while we're at it, pinging @ResearchLabAssistant to let them test if it fixes their crash.
641458c to
a1b5700
Compare
Fixes #5853
The gist of this PR:
tr_torrent::stat(). This fixes the race condition.tr_torrentStat()for all torrents to avoid waiting for the lock multiple times.The Problem
Last time we tried implementing [1] (#4571), it caused jank/beachballing in the macOS app and the fix was reverted (#4936).
OTOH, the Qt app takes the session lock for every API call it makes towards the core, but there were never any complaint about UI stuttering. So I went and took a look at how they are doing things differently.
The starkest difference I noticed is, in contrast to the macOS app, which takes and releases the lock in every
tr_torrentStat()call, the Qt app's session update timer loop takes the session lock once and does not release it until it has done alltr_torrentStat()calls it needed to make. My hunch is this saves a lot of time waiting on the lock. This is possible for the Qt app because the it uses the RPC API instead of the C API for most API calls towards the core.Another minor difference is that the RPC API supports the notion of "recently active" torrents, while the C API does not.
The Solution
For now, what I did is expose a function in the C API that returns the RAII object that owns the session lock, then use it in the macOS app's session update loop.
I realise this is ugly and exposes internal implementation details, but I don't feel comfortable taking it a step further because I'm basically editing the macOS code on Notepad with no compiler.
Next steps
As an intermediate step, I think we can expose a new
tr_torrentStat()variant that takes the lock once and returns the stat object pointers for multiple torrents and refactor the macOS app around it. I can do that libtransmission part, but someone else will have to do the work in the macOS app.The end goal is of course to convert the macOS (and the GTK app) into a standalone RPC client.