Improved ASA calculation performance by josemduarte · Pull Request #820 · biojava/biojava

josemduarte · 2019-01-03T02:26:28Z

Now neighbors are calculated with spatial hashing, improving performance dramatically for cases with many thousands of atoms. Also improved docs, logs and added new tests.

sbliven

Great addition! The new algorithm should speed things up considerably. Have you done any benchmarking?

The code looks good, nice tests, thanks a lot!

sbliven · 2019-01-03T13:23:33Z

biojava-structure/src/main/java/org/biojava/nbio/structure/asa/AsaCalculator.java

 import javax.vecmath.Point3d;
-import java.util.ArrayList;
-import java.util.TreeMap;
+import java.util.*;


Minor point, but maybe we should try to avoid wildcard imports?

That's an IntelliJ feature. I do like individual imports better too... I'll have to check if I can configure IntelliJ to do it.

sbliven · 2019-01-03T13:25:05Z

biojava-structure/src/main/java/org/biojava/nbio/structure/asa/AsaCalculator.java

 	public static final double DEFAULT_PROBE_SIZE = 1.4;
 	public static final int DEFAULT_NTHREADS = 1;

+	public static final boolean DEFAULT_USE_SPATIAL_HASHING = true;


Should this be private?

Indeed, thanks for catching. I'll make private

sbliven · 2019-01-03T13:30:10Z

biojava-structure/src/main/java/org/biojava/nbio/structure/asa/AsaCalculator.java

+	/**
+	 * Set the useSpatialHashingForNeighbors flag to use spatial hashing to calculate neighbors (true) or all-to-all
+	 * distance calculation (false). Default is {@value DEFAULT_USE_SPATIAL_HASHING}.
+	 * Use for testing performance only.


For small molecules is the old algorithm faster? If so maybe document that here.

Performance is essentially the same for small molecules. The gain is very noticeable in very large molecules.

ghost · 2019-01-04T23:33:14Z

biojava-structure/src/main/java/org/biojava/nbio/structure/asa/AsaCalculator.java


 		}
+		end = System.currentTimeMillis();
+		logger.debug("Took {} s to calculate all {} atoms ASAs (excluding neighbors calculation)", (end-start)/1000.0, atomCoords.length);


I'd probably keep invoke this function from a test and time and log there, eventually move it to a benchmark test suite.

My use case was testing other software that uses ASA calculation deep into some function call. I thought this would be a good way of debugging it. Do you see potential performance problems with this?

No, I don't see any real issue. I personally tend to move code from main to tests whenever possible, simply to reduce clutter in the real code.

currentTimeMillis probably cost nothing and the other computation in the logging steps seems minimal. I'm transitioning back to java, so I'm not an expert in java just yet.

Feel free to ignore my comments here. I'm mostly writing for myself to get my head into some of biojava's code.

@kanishka-azimi Thanks for contributing! The more people reviewing PRs the better!

ghost · 2019-01-04T23:48:36Z

biojava-structure/src/main/java/org/biojava/nbio/structure/asa/AsaCalculator.java

+			}
+
+			int[] indicesArray = new int[thisNbIndices.size()];
+			for (int i=0;i<thisNbIndices.size();i++) indicesArray[i] = thisNbIndices.get(i);


Very minor note. Since random access isn't being used in the array list, could use Queue interface instead of List.

Some cute shortcuts - https://stackoverflow.com/questions/960431/how-to-convert-listinteger-to-int-in-java . I don't know if biojava has any general preference regarding guava or apache commons.

Thanks! I like the answer that uses the stream API: https://stackoverflow.com/a/23945015/3914327

ghost · 2019-01-05T00:07:23Z

biojava-structure/src/main/java/org/biojava/nbio/structure/asa/AsaCalculator.java

-	 * Returns list of indices of atoms within probe distance to atom k.
-	 * @param k index of atom for which we want neighbor indices
+	 * Returns the 2-dimensional array with neighbor indices for every atom.
+	 * @return 2-dimensional array of size: n_atoms x n_neighbors_per_atom


adjacency list vs adjacency matrix...possible prefer one or the other universally based on expected density of graph

ghost · 2019-01-05T00:09:47Z

biojava-structure/src/main/java/org/biojava/nbio/structure/asa/AsaCalculator.java

+				max = array[i];
+			}
+		}
+		return max;


minor: some stream shortcut could replace this function.

ghost · 2019-01-05T00:19:37Z

biojava-structure/src/main/java/org/biojava/nbio/structure/asa/AsaCalculator.java

+				indices.put(i, iIndices);
+			} else {
+				iIndices = indices.get(i);
+			}


minor: putIfAbsent

ghost · 2019-01-05T00:31:20Z

biojava-structure/src/test/java/org/biojava/nbio/structure/asa/TestAsaCalc.java

+
+			assertEquals(nbs.length, nbsSh.length);
+
+			assertEquals(nbs.length, listOfMatchingIndices.size());


I'd probably have two tests.

test for the functionality against the simplest implementation.

test ensuring the two implementations produce equivalent results.
I am only reading the diffs, so I may be overlooking what is already there.

josemduarte added 10 commits January 1, 2019 16:39

Neighbor indices now uses spatial hashing

c213a7d

Supporting neighbors with and without SH

3b99afd

A test, demoing that SH actually performs much worse

a84609b

Doing all neighbor indices upfront

1100299

Efficient neighbor index finding with SH

ddeb759

Tidying up

ede5cee

Logging, docs and some minimal optimization

041482d

Logging and cleanup

f42a0fa

Docs

373a407

Docs

b674136

sbliven approved these changes Jan 3, 2019

View reviewed changes

Should be private

8fb10fd

sbliven merged commit de9cc0e into biojava:master Jan 3, 2019

ghost reviewed Jan 4, 2019

View reviewed changes

ghost reviewed Jan 5, 2019

View reviewed changes

josemduarte mentioned this pull request Jan 17, 2019

Bugfixes for ASA calculation #824

Merged


		assertEquals(nbs.length, nbsSh.length);

		assertEquals(nbs.length, listOfMatchingIndices.size());

Conversation

josemduarte commented Jan 3, 2019

Uh oh!

sbliven left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost Jan 4, 2019 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ghost Jan 4, 2019 •

edited by ghost

Loading