Optimization of subunit clusterer for quaternary sym detection of PDB deposited structures #857

josemduarte · 2020-01-07T19:41:16Z

This pull request introduces a new switch for the subunit clusterer used for symmetry detection. If the switch useEntityIdForSeqIdentityDetermination is enabled, it uses the entity id of the subunits to establish identity of sequences, saving the full Smith-Waterman alignment calculation.

This optimization is important in cases like large viral capsids, where there are many thousands of chains and all-to-all sequence alignments become a bottleneck. E.g. for 6Q1F the runtime goes from ~ 6 hours to 7 minutes.

… uses entity id infor. Some tests are failing

lafita

Looks good! I did not test the code with large viral assemblies... Thanks Jose!

lafita · 2020-01-08T11:13:26Z

biojava-structure/src/main/java/org/biojava/nbio/structure/cluster/SubunitClusterer.java

 							clusters.remove(c2);
-						} else if (clusters.get(c1).mergeIdentical(clusters.get(c2))) {
-							// This always makes sense as an optimization: it's far cheaper to compare the sequence
-							// string than doing a full S-W alignment


I think you are correct here but there are some problems like missing residues at the ends or modified amino acids, so using S-W alignment was an easy solution at the time, but clearly not optimal. Do you have an idea why the test is failing?

With this optimization enabled, the test that fails is TestQuatSymmetryDetectorExamples.testLocal(), but I have no idea why. I then decided to remove this optimization as anyway for my use-case the entity id comparison takes care of it.

Ok makes sense! Thanks

lafita · 2020-01-08T11:19:04Z

biojava-structure/src/main/java/org/biojava/nbio/structure/cluster/SubunitCluster.java


-	private List<Subunit> subunits = new ArrayList<Subunit>();
-	private List<List<Integer>> subunitEQR = new ArrayList<List<Integer>>();
+	private List<Subunit> subunits = new ArrayList<>();


Why did you remove the Class of elements in the List when it is defined? Has something changed in Java8+?

Since Java 7 it is redundant to use the types in the variable initialization. They <> is called "diamond". See "The diamond" section here: https://docs.oracle.com/javase/tutorial/java/generics/types.html

I think it is good style to remove them, since they are redundant. It helps readability.

josemduarte · 2020-01-08T23:43:16Z

FYI for all: after this is merged I'd like to cut a 5.4.0 release (bump in minor, since this one introduces a tiny new feature).

josemduarte added 5 commits December 12, 2019 11:34

Logging, cleanups, cosmetics

45f1d70

Extracting duplicate code to a method. Some cleanups

568a8d4

Introduced optimization: now if switch is provided subunit clustering…

f3515a7

… uses entity id infor. Some tests are failing

Reverted change that caused TestQsAlignExamples to fail

47e4cfb

Using mergeIdentical breaks a test, reverting

3b967b0

josemduarte requested a review from lafita January 7, 2020 19:41

lafita approved these changes Jan 8, 2020

View reviewed changes

josemduarte merged commit 5840f24 into biojava:master Jan 12, 2020

josemduarte mentioned this pull request Jan 23, 2020

Fixes on optimization of subunit clusterer for quaternary sym detection of PDB deposited structures #859

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization of subunit clusterer for quaternary sym detection of PDB deposited structures #857

Optimization of subunit clusterer for quaternary sym detection of PDB deposited structures #857

Uh oh!

josemduarte commented Jan 7, 2020

Uh oh!

lafita left a comment

Uh oh!

lafita Jan 8, 2020

Uh oh!

josemduarte Jan 8, 2020

Uh oh!

lafita Jan 9, 2020

Uh oh!

lafita Jan 8, 2020

Uh oh!

josemduarte Jan 8, 2020

Uh oh!

josemduarte commented Jan 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimization of subunit clusterer for quaternary sym detection of PDB deposited structures #857

Optimization of subunit clusterer for quaternary sym detection of PDB deposited structures #857

Uh oh!

Conversation

josemduarte commented Jan 7, 2020

Uh oh!

lafita left a comment

Choose a reason for hiding this comment

Uh oh!

lafita Jan 8, 2020

Choose a reason for hiding this comment

Uh oh!

josemduarte Jan 8, 2020

Choose a reason for hiding this comment

Uh oh!

lafita Jan 9, 2020

Choose a reason for hiding this comment

Uh oh!

lafita Jan 8, 2020

Choose a reason for hiding this comment

Uh oh!

josemduarte Jan 8, 2020

Choose a reason for hiding this comment

Uh oh!

josemduarte commented Jan 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants