Fixes and improvements to reference cache.#666
Conversation
Fix issue with InterpreterConcurrencyTest failing periodically Expose lazy loaded exceptions obscured by dereferenceValue's catch block Fixed future task asynchronous process via static final cached thread pool Removed debug output from ReferenceQueueMonitor's empty catch block Add message to requireNonNull as specified for values Fix key tests assigning null values and added tests for NPE checks Fixed deprecated assertThat and ExpectedException on ReferenceCacheTest Removed redundant throws Exception from test cases Added tests to prove removal of finalized references Added tests to prove auto init of expired references Reached 99% code coverage on ReferenceCache and appears to be the limit
Codecov Report
@@ Coverage Diff @@
## master #666 +/- ##
============================================
+ Coverage 70.59% 70.89% +0.30%
- Complexity 2719 2756 +37
============================================
Files 106 106
Lines 9029 9057 +28
Branches 1772 1775 +3
============================================
+ Hits 6374 6421 +47
+ Misses 2232 2220 -12
+ Partials 423 416 -7
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
| }); | ||
| cache.put(refKey, task); | ||
| task.run(); | ||
| async.submit(task); |
There was a problem hiding this comment.
FutureReturnValueIgnored: Return value of methods returning Future must be checked. Ignoring returned Futures suppresses exceptions thrown from the code that completes the Future.
ℹ️ Learn about @sonatype-lift commands
You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.
| Command | Usage |
|---|---|
@sonatype-lift ignore |
Leave out the above finding from this PR |
@sonatype-lift ignoreall |
Leave out all the existing findings from this PR |
@sonatype-lift exclude <file|issue|path|tool> |
Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file |
Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.
Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]
There was a problem hiding this comment.
It is being checked when the return value is actually used, hence asynchronous, and the exceptions are rolled up:
beanshell/src/main/java/bsh/util/ReferenceCache.java
Lines 154 to 160 in ff28429
There is no option to do so otherwise, as handling method throws are required, so why raise this concern on calling the run method (which does not throw any exceptions and can be called without concern). Also, if return values are never used why must it be checked?
| Parts parts = PARTSCACHE.get(value); | ||
| if (null != parts) | ||
| return parts; | ||
| PARTSCACHE.remove(value); |
There was a problem hiding this comment.
It is tested because this change actually fixes an existing test failing, check again Codecov.
|
Added ignore warning for lift. |
|
Codecov is loosing the report, nothing shown for this pull request: |
|
After a re-run codecov now displays, will attempt to hit the missing items from the report |
5ec1a8b to
3d5ea7f
Compare
3d5ea7f to
1e08b15
Compare
|
Found a type in the |
|
All the PR checks are happy now as well, so @opeongo if you are happy with the |
|
Committing this now because builds are failing due to the InterpreterConcurrencyTest which this fixes. |
|
Hi @nickl- I have been away for a few days due to a surgery and a resulting limited ability to do keyboarding, so sorry for taking a while to respond. I am only looking at the ReferenceCache changes that I am familiar with. At first glance this PR does not close the race in But maybe this fixes #655 so I will pull and build and test and report back. |
|
I tried running my problematic workload with the 60493f1 version from master branch and there is a severe performance regression. After 12 hours of running the task had not proceeded very far so I was not able to get an indication if these changes have fixed #655. I ran the same workload again using bsh-2.0b5 and my task accomplished as much as the current master in only 2 minutes of clock time. 12 hours vs 2 minutes for the same forward progress, something is wrong. I ran again with my master branch from a few weeks ago and the workload run substantially faster, although subjectively slower than bsh-2.0b5. This indicates that a recent change is responsible for the regression. Unless you have a better idea, I think the steps forward at this stage are to bisect the recent changes to see what caused the performance regression. Also I should implement a more objective performance metric than "it seemed faster". What is best here? Elapsed clock time, cpu time? Is there a regression test framework that we should look at? |
|
After bisecting it seems that the performance problem shows up in 1e08b15. The only suspicious change is 1e08b15#r92725995. Maybe this has to do with the ConcurrentLinkedQueue implementation. The API docs says that it needs to do full traversals for some methods like
|
|
See 1e08b15#r92749798 for a fix to the performance regression. This brings the performance back again, but I wonder if it is just covering up the problem. If the ConcurrentLinkedQueue container was running slow then perhaps too many references are building up and not getting cleared. Something seems fishy. |
|
I added a print statement to ClassManagerImpl to have a look at the number of listeners that are registered. Using my test workload the number pretty quickly shoots up to hundreds of thousands of entries in the listeners container. The value is mostly increasing but in some cases gets a bit smaller, so I guess some entries are getting reaped but many are not. |
Though you were going to say revert lol =)
We need something to give us proper bench stats with load testing et al, is JMeter the go to? Using the test execution time proves very little.
Something is seriously dodgy with that code and needs a revamp. As also mentioned 1e08b15#r92812487 Thanx for the review, hope the hand is bearable. |
It was no big deal to chase this one down because it was obvious it crept in during a very recent commit and there were very few lines to look at. However, since you poked me about it, I will give you my reaction. I think revert are appropriate when:
Reverts are especially appropriate when the change has to do with a non-essential feature or optimization attempt that is not core to getting to a stable release candidate. If it is something that can be triaged to fix later then do so and quickly get it out of the way so that others can keep on working. The greatest thing about using a git for the software development workflow is that it is so easy to make a revert and put the change in to a feature branch where the proponent for the change can spend time reworking and testing. The code is not thrown away, but passed back to the person who made the change (or another interested party) so that they can have another look at it and get it right. I don't understand why you wont accept working this way when it is what just about every other development shop on the planet is doing. Blocking bugs on the trunk is a big no no, and you do it all the time. The development style that you are imposing on this project is like hostage taking. You are essentially doing development on the trunk. And you are doing it without sufficient testing on real work loads (I say this because I run real world work loads and bugs crop up quickly). When problems show up instead of reverting and taking the change in to a feature branch you insist that it should stay in the trunk and volunteers should step up and fix it. When bugs like this are introduced it impacts everyone. No one else can work on the stuff that is important to them because someone else introduced a bug in to the trunk that messes up doing doing work in other areas since standard core features that used to work are now broken (like basic math, I'm looking at you #590, or scripted objects #659). Now we all have to switch gears and redirect effort to become expert on part of the code base that previously they had nothing to do with. Ok, so we are a team and we can help each other out this way, for sure. But it happens so often it seems like a bad habit that you have got. |
Was just kidding hoping to break the ice, guess it is still a sensitive issue.
I might agree with you on new changes, but we are not talking about new changes but commits that are the result of merging in forks. These should be seen as historic changes already part of the trunk and they were all enhancements that were thought necessary and we do not have the original committers present anymore to resubmit their change. It is just a fact of matter which we have to live with and we can find the issue resolved with a little elbow grease.
We just need more unit tests and can only ensure changes do not impact the available unit tests. So to improve this we need more unit tests. All changes go back to master, and releases are made from master, so there is not much that can be done about that. |
Fixes and improvements to reference cache.
Was getting annoyed with the intermittent failure of the
InterpreterConcurrencyTestso fixed it. Failing tests, intermittent or otherwise, should be considered show stoppers and takes precedence over anything else, I am sure you will agree. The problem was that the value references from the WeakHashMap was finalized but the key entries never gets cleaned up, which was one motivation for developing the reference cache, see below. Not sure if this resolves #655, lacking a root cause analysis or unit test, since this issue specifically pertains to the dot notation in references. We are also usingWeakHashMapinReflectionandCapabilitiesso the issue may be related.Also took this time to address the issues raised on the asynchronicity of
ReferencheCache. @opeongo please review this PR and confirm that you agree with the changes.I went with option 3 from the discussion since we may or may not actually use the reference it pays not to be blocking for initialization (use async instead), yet still populate the reference cache as expected (instead of opting for lazy loading). To mitigate the concern of spawning hundreds of threads I have opted for a cached thread pool which will reuse previous threads if available but also clear them if idle and not consume unnecessary resources, which seems the best of both worlds.
It is worth mentioning that the concern raised in option 2 about not having the values loaded as weak or soft references is invalid. The reference cache uses a self monitoring reference queue process registered on the cache keys, which will remove cache entries when the keys are invalidated, something not implemented for the standard WeakHashMap. Which makes the lazy load probably the best option but it changes the original intentions of the reference cache design and I opted for option 3 as it has the least impact (talk about lazy). We should probably replace all the
WeakHashMapswithReferenceCachebut that is a bigger challenge in itself, because it is not a drop in replacement and takes a completely different approach to populating the collection which enables the automatic re-initializing of expired values removed from the cache when requested.Using the maven build process and testing completion time as a benchmark, I know this is not a scientific bench but still a ballpark, the cached thread pool is a slight improvement over the current non-asynchronous implementation but significantly faster than simply spawning new threads for each request which quickly takes a toll. I was unable to see a speed difference compared with a fixed pool, but if you have a long running test (more than a minute), the memory footprint is much smaller with the timed out threads.
There was another annoying bug I picked up with exceptions being obfuscated by the reference cache and not rolling up to the client. This also allowed for null values which the cache specifically disallows because it needs null state to test whether a reference expired or not. This has been rectified and an appropriate null pointer exception message was added to distinguish the problem from other NPEs.
Added several more tests for the reference queue, also proving cache expiry and auto reinitialize of expired entries, reaching 99% test coverage, which seems like the best we are going to get unless someone else can do better.
Changes in this commit includes:
InterpreterConcurrencyTest failing periodicallydereferenceValue'scatch blockReferenceQueueMonitor'sempty catch blockrequireNonNullas specified for valuesassertThatandExpectedExceptiononReferenceCacheTestReferenceCacheand appears to be the limit