-
Notifications
You must be signed in to change notification settings - Fork 0
Description
With 5da8173, I removed the progress reporting from the DefaultOpEnvironment's op discovery, because it was failing the OpRegressionTest on one of my systems. Investigation shone light onto three distinct issues, the first of which is that the Progress data structure gets confused about subtasks when a task gets orphaned:
java.lang.ExceptionInInitializerError: Exception java.lang.IllegalStateException:
Task OpEnvironment: Discovering Ops has subtasks that did not complete!
This condition arises in the Task#complete() method when tasksDefined && progress() != 1.0 && current.longValue() == max.longValue(). One theory is that this happens when previous attempts at DefaultOpEnvironment environment construction fail, leaving orphaned Task objects hanging out in the ThreadLocal<ArrayDeque<Task>> progressibleStack data structure, such that later tasks on the same thread end up erroneously reasoning that they must then be subtasks of those orphaned tasks. But the evidence is not quite a slam dunk match to that theory, because: 1) I checked the number of pending subtasks in my failing scenario and it is 3 rather than 1, which seems suspicious; and 2) the first failing environment construction must be due to some other reason than previously orphaned task—that is, the first task must get orphaned some other way than this sort of cascading task failure. Which brings us to #229...