Skip to content

Conversation

@josemduarte
Copy link
Contributor

Now neighbors are calculated with spatial hashing, improving performance dramatically for cases with many thousands of atoms. Also improved docs, logs and added new tests.

Copy link
Member

@sbliven sbliven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition! The new algorithm should speed things up considerably. Have you done any benchmarking?

The code looks good, nice tests, thanks a lot!

import javax.vecmath.Point3d;
import java.util.ArrayList;
import java.util.TreeMap;
import java.util.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point, but maybe we should try to avoid wildcard imports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an IntelliJ feature. I do like individual imports better too... I'll have to check if I can configure IntelliJ to do it.

public static final double DEFAULT_PROBE_SIZE = 1.4;
public static final int DEFAULT_NTHREADS = 1;

public static final boolean DEFAULT_USE_SPATIAL_HASHING = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks for catching. I'll make private

/**
* Set the useSpatialHashingForNeighbors flag to use spatial hashing to calculate neighbors (true) or all-to-all
* distance calculation (false). Default is {@value DEFAULT_USE_SPATIAL_HASHING}.
* Use for testing performance only.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For small molecules is the old algorithm faster? If so maybe document that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is essentially the same for small molecules. The gain is very noticeable in very large molecules.

@sbliven sbliven merged commit de9cc0e into biojava:master Jan 3, 2019

}
end = System.currentTimeMillis();
logger.debug("Took {} s to calculate all {} atoms ASAs (excluding neighbors calculation)", (end-start)/1000.0, atomCoords.length);
Copy link

@ghost ghost Jan 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably keep invoke this function from a test and time and log there, eventually move it to a benchmark test suite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My use case was testing other software that uses ASA calculation deep into some function call. I thought this would be a good way of debugging it. Do you see potential performance problems with this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't see any real issue. I personally tend to move code from main to tests whenever possible, simply to reduce clutter in the real code.

currentTimeMillis probably cost nothing and the other computation in the logging steps seems minimal. I'm transitioning back to java, so I'm not an expert in java just yet.

Feel free to ignore my comments here. I'm mostly writing for myself to get my head into some of biojava's code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kanishka-azimi Thanks for contributing! The more people reviewing PRs the better!

}

int[] indicesArray = new int[thisNbIndices.size()];
for (int i=0;i<thisNbIndices.size();i++) indicesArray[i] = thisNbIndices.get(i);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor note. Since random access isn't being used in the array list, could use Queue interface instead of List.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cute shortcuts - https://stackoverflow.com/questions/960431/how-to-convert-listinteger-to-int-in-java . I don't know if biojava has any general preference regarding guava or apache commons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I like the answer that uses the stream API: https://stackoverflow.com/a/23945015/3914327

* Returns list of indices of atoms within probe distance to atom k.
* @param k index of atom for which we want neighbor indices
* Returns the 2-dimensional array with neighbor indices for every atom.
* @return 2-dimensional array of size: n_atoms x n_neighbors_per_atom
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjacency list vs adjacency matrix...possible prefer one or the other universally based on expected density of graph

max = array[i];
}
}
return max;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: some stream shortcut could replace this function.

indices.put(i, iIndices);
} else {
iIndices = indices.get(i);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: putIfAbsent


assertEquals(nbs.length, nbsSh.length);

assertEquals(nbs.length, listOfMatchingIndices.size());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably have two tests.

  1. test for the functionality against the simplest implementation.
  2. test ensuring the two implementations produce equivalent results.
    I am only reading the diffs, so I may be overlooking what is already there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants