Skip to content

Conversation

@josemduarte
Copy link
Contributor

This pull request introduces a new switch for the subunit clusterer used for symmetry detection. If the switch useEntityIdForSeqIdentityDetermination is enabled, it uses the entity id of the subunits to establish identity of sequences, saving the full Smith-Waterman alignment calculation.

This optimization is important in cases like large viral capsids, where there are many thousands of chains and all-to-all sequence alignments become a bottleneck. E.g. for 6Q1F the runtime goes from ~ 6 hours to 7 minutes.

@josemduarte josemduarte requested a review from lafita January 7, 2020 19:41
Copy link
Member

@lafita lafita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I did not test the code with large viral assemblies... Thanks Jose!

clusters.remove(c2);
} else if (clusters.get(c1).mergeIdentical(clusters.get(c2))) {
// This always makes sense as an optimization: it's far cheaper to compare the sequence
// string than doing a full S-W alignment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are correct here but there are some problems like missing residues at the ends or modified amino acids, so using S-W alignment was an easy solution at the time, but clearly not optimal. Do you have an idea why the test is failing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this optimization enabled, the test that fails is TestQuatSymmetryDetectorExamples.testLocal(), but I have no idea why. I then decided to remove this optimization as anyway for my use-case the entity id comparison takes care of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense! Thanks


private List<Subunit> subunits = new ArrayList<Subunit>();
private List<List<Integer>> subunitEQR = new ArrayList<List<Integer>>();
private List<Subunit> subunits = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the Class of elements in the List when it is defined? Has something changed in Java8+?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Java 7 it is redundant to use the types in the variable initialization. They <> is called "diamond". See "The diamond" section here: https://docs.oracle.com/javase/tutorial/java/generics/types.html

I think it is good style to remove them, since they are redundant. It helps readability.

@josemduarte
Copy link
Contributor Author

FYI for all: after this is merged I'd like to cut a 5.4.0 release (bump in minor, since this one introduces a tiny new feature).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants