-
Notifications
You must be signed in to change notification settings - Fork 397
Optimization of subunit clusterer for quaternary sym detection of PDB deposited structures #857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization of subunit clusterer for quaternary sym detection of PDB deposited structures #857
Conversation
… uses entity id infor. Some tests are failing
lafita
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I did not test the code with large viral assemblies... Thanks Jose!
| clusters.remove(c2); | ||
| } else if (clusters.get(c1).mergeIdentical(clusters.get(c2))) { | ||
| // This always makes sense as an optimization: it's far cheaper to compare the sequence | ||
| // string than doing a full S-W alignment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are correct here but there are some problems like missing residues at the ends or modified amino acids, so using S-W alignment was an easy solution at the time, but clearly not optimal. Do you have an idea why the test is failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this optimization enabled, the test that fails is TestQuatSymmetryDetectorExamples.testLocal(), but I have no idea why. I then decided to remove this optimization as anyway for my use-case the entity id comparison takes care of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense! Thanks
|
|
||
| private List<Subunit> subunits = new ArrayList<Subunit>(); | ||
| private List<List<Integer>> subunitEQR = new ArrayList<List<Integer>>(); | ||
| private List<Subunit> subunits = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove the Class of elements in the List when it is defined? Has something changed in Java8+?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Java 7 it is redundant to use the types in the variable initialization. They <> is called "diamond". See "The diamond" section here: https://docs.oracle.com/javase/tutorial/java/generics/types.html
I think it is good style to remove them, since they are redundant. It helps readability.
|
FYI for all: after this is merged I'd like to cut a 5.4.0 release (bump in minor, since this one introduces a tiny new feature). |
This pull request introduces a new switch for the subunit clusterer used for symmetry detection. If the switch
useEntityIdForSeqIdentityDeterminationis enabled, it uses the entity id of the subunits to establish identity of sequences, saving the full Smith-Waterman alignment calculation.This optimization is important in cases like large viral capsids, where there are many thousands of chains and all-to-all sequence alignments become a bottleneck. E.g. for 6Q1F the runtime goes from ~ 6 hours to 7 minutes.