-
Notifications
You must be signed in to change notification settings - Fork 397
Improved ASA calculation performance #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sbliven
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition! The new algorithm should speed things up considerably. Have you done any benchmarking?
The code looks good, nice tests, thanks a lot!
| import javax.vecmath.Point3d; | ||
| import java.util.ArrayList; | ||
| import java.util.TreeMap; | ||
| import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point, but maybe we should try to avoid wildcard imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an IntelliJ feature. I do like individual imports better too... I'll have to check if I can configure IntelliJ to do it.
| public static final double DEFAULT_PROBE_SIZE = 1.4; | ||
| public static final int DEFAULT_NTHREADS = 1; | ||
|
|
||
| public static final boolean DEFAULT_USE_SPATIAL_HASHING = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, thanks for catching. I'll make private
| /** | ||
| * Set the useSpatialHashingForNeighbors flag to use spatial hashing to calculate neighbors (true) or all-to-all | ||
| * distance calculation (false). Default is {@value DEFAULT_USE_SPATIAL_HASHING}. | ||
| * Use for testing performance only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For small molecules is the old algorithm faster? If so maybe document that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance is essentially the same for small molecules. The gain is very noticeable in very large molecules.
|
|
||
| } | ||
| end = System.currentTimeMillis(); | ||
| logger.debug("Took {} s to calculate all {} atoms ASAs (excluding neighbors calculation)", (end-start)/1000.0, atomCoords.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably keep invoke this function from a test and time and log there, eventually move it to a benchmark test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My use case was testing other software that uses ASA calculation deep into some function call. I thought this would be a good way of debugging it. Do you see potential performance problems with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't see any real issue. I personally tend to move code from main to tests whenever possible, simply to reduce clutter in the real code.
currentTimeMillis probably cost nothing and the other computation in the logging steps seems minimal. I'm transitioning back to java, so I'm not an expert in java just yet.
Feel free to ignore my comments here. I'm mostly writing for myself to get my head into some of biojava's code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kanishka-azimi Thanks for contributing! The more people reviewing PRs the better!
| } | ||
|
|
||
| int[] indicesArray = new int[thisNbIndices.size()]; | ||
| for (int i=0;i<thisNbIndices.size();i++) indicesArray[i] = thisNbIndices.get(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor note. Since random access isn't being used in the array list, could use Queue interface instead of List.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some cute shortcuts - https://stackoverflow.com/questions/960431/how-to-convert-listinteger-to-int-in-java . I don't know if biojava has any general preference regarding guava or apache commons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I like the answer that uses the stream API: https://stackoverflow.com/a/23945015/3914327
| * Returns list of indices of atoms within probe distance to atom k. | ||
| * @param k index of atom for which we want neighbor indices | ||
| * Returns the 2-dimensional array with neighbor indices for every atom. | ||
| * @return 2-dimensional array of size: n_atoms x n_neighbors_per_atom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjacency list vs adjacency matrix...possible prefer one or the other universally based on expected density of graph
| max = array[i]; | ||
| } | ||
| } | ||
| return max; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: some stream shortcut could replace this function.
| indices.put(i, iIndices); | ||
| } else { | ||
| iIndices = indices.get(i); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: putIfAbsent
|
|
||
| assertEquals(nbs.length, nbsSh.length); | ||
|
|
||
| assertEquals(nbs.length, listOfMatchingIndices.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably have two tests.
- test for the functionality against the simplest implementation.
- test ensuring the two implementations produce equivalent results.
I am only reading the diffs, so I may be overlooking what is already there.
Now neighbors are calculated with spatial hashing, improving performance dramatically for cases with many thousands of atoms. Also improved docs, logs and added new tests.