-
Notifications
You must be signed in to change notification settings - Fork 222
[Obsolete draft] Fix dropped jobs issue #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I've closed #193 which had an alternate approach, and which has considerable discussion. |
|
I will have to look into the decrease in code coverage. I need to revisit whether changing the "should not cause an error if immediately removed'" is necessary. |
|
Unfortunately the old version of "should not cause an error if immediately removed" fails when run with the new id-setting logic. It is a very hairy test that wraps Me wonders if somehow this test relies upon the old race condition... My speculation at this point is that the test used to work because the job was created in Redis early in the Way too much fragile asynchrony in play... |
|
By a gross, but deterministic hack to |
|
Evidence: With the changes of this PR, after running overnight, I have 0 lost jobs from over 44 million jobs run (numSucceededLost: 0): |
skeggse
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty concerned about the performance impact of this - particularly on high-latency connections. I also think it'll significantly complicate the implementation of #198.
I wonder whether this could be functionality that gets enabled only when some combination of options (e.g. storeJobs) is enabled - the storeJobs option and the other events-related options can result in memory leaks and poor scaling parameters when enabled, and we don't use them internally.
| // The only reason for this callback is to support deterministic testing of | ||
| // removeJob between the time brpoplpush resolves and Job.fromId() runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of modifying the source to support the test, what do you think about modifying the test to either stub the Job.fromId static method or modify the brpoplpush function to provide this hook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had thought about wrapping Job.fromId() -- I'll revisit that approach.
| .then((results) => { | ||
| const numRaised = results[0], nextOpportunity = results[1]; | ||
| if (numRaised) { | ||
| // not documented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh yeah
This seems unrelated to the overarching PR, and maybe we could document it instead of leaving it more explicitly undocumented.
| waitDone = null; | ||
| } | ||
| return waitJob.apply(this, args); | ||
| return waitJob.apply(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply like this just always feels weird to me :\
| return waitJob.apply(this); | |
| return waitJob.call(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right.
|
Another solution I had working was for Regarding #198 -- I imagine the present implementation of Do you think I should I pursue the uuid approach with an eye towards how |
That's possible, though my (albeit fairly ancient) understanding of how pipelined commands work is that Redis still iterates through clients and accepts commands one at a time from each client where there are buffered commands. What that means (I think) is that the I like the approach in #193 due to its simplicity and its superior scaling properties/avoidance of additional latency in the save-job flow. I think the key requirements that I'm looking for in a fix for #78 / #198 are (specifically when
So based on those requirements, here are my thoughts (in order of preference, I think):
I suspect the solution will be one of the first two, with the thinking that the event delivery system will become a plugin at some point. Ultimately, the solution will need to be predicated on your needs, as we don't use bee-queue's event delivery due to its somewhat suboptimal scaling properties (and lack of need for its functionality to begin with). |
|
There's a lot to think about that you've outlined. I appreciate your putting it all down. We've touched on version 2 in other PR and Issues threads. How much of the API is on the table for version 2? Where is that discussion happening? I have some radical simplification ideas, prompted by your mentions of a more plug-in centric architecture. |
[ This PR is made obsolete by #237]
Changes to fix problem reported in #189 and #78
This fix adopts an approach mentioned in #78 of making a separate Redis call to update the id counter. Some simplifications emerge, but the extra overhead should be reviewed.
The example code in #196 shows no lost jobs when run against this branch.