Multiple Structure Alignment Datastructures#278
Merged
lafita merged 76 commits intobiojava:minorfrom Jun 16, 2015
Merged
Conversation
The core data structures for the Multiple Alignment object have been created: MultipleAlignment, BlockSet, Block, Pose.
The distanceMatrix is renamed to distanceTables to match with the AFPChain nomenclature. The description of replaceOptAln has also been changed to be more general.
The pose contains the translation and the rotationMatrix as information of the 3D transformation of the proteins. A Demo for the display of the multiple alignment has been created.
In order to generalize the 3D GUI features of the Structure Alignment and implement a Multiple Alignment GUI for the new MultipleAlignment object.
The multiple alignments can be visualized through the MultipleAlignmentJmol class, adapted from the StructureAlignmentJmol. The coloring of the different blocks and the alignment menus are still not implemented.
Gaps are described by null values in the Blocks of the MultipleAlignment. Now the Jmol class accounts for these gaps and does not color them.
from the Pose class, because it is a static variable that does not depend on the specific BlockSet. It only stores the intra-residue distances of every protein.
The wrong line was commented out, so the molecule was not colored.
Adapted the display method in StructureAlignmentDisplay to rotate and display in Jmol the atoms of a MultipleAlignment.
Minor changes to respond to TODOs
Interfaces for the classes Block, Pose and BlockSet have been created to generalize and document all the methods needed for a MultipleAlignment object.
The interfaces have been implemented again and the Jmol display also works for the new MultipleAlignment DS composition.
Add some methods to calculate internal variables (update), and moved the cache variables (RMSD, TM-score, similarity, coverage) from the MultipleAlignment to these two classes.
Another layer in the OO data structure has been added to allow returning alternative alignments. An ensemble of MSTA is a collection of MultipleAlignment objects. Another change has been the addition of two different implementations of Pose, one to determine global superimpositions and another to determine flexible part superimpositions.
When an object is created with the constructor and its parent is set, the parent also gets a link to the object automatically.
The Ensemble can calculate the distance Matrices for every structure in the updateDistanceMatrix() method. Automatic cross-references added to the setParent() methods, for consistency.
All pairwise structural comparisons are evaluated to build the background distance Matrices. Atoms can be rotated from Pose as well.
A new Pose abstract implementation has been created that calculates the TMscore and RMSD of the alignment. The name of AlignmentJmol has been changed to AbstractAlignmentJmol to be clear that is an abstract class.
A constructor for a new MultipleAlignment can be used from an AFPChain. It creats an equivalent alignment object, for backwards compatibility.
The clone methods now entirely change the links between the cloned and the original objects so that no cross-links occur.
An initial implementation of the CEMC algorithm for multiple structure alignment has been created. Now a seed MultipleAlignment can be created with a parallel pairwise all-to-all alignment. The MC optimization is still not implemented. A demo is available under the structure-gui package.
In the transition to replace AFPChain with the MultipleAlignment class. A core structure for the CEMC algorithm has also been created.
Member
|
Brilliant... time for a new release... |
Member
Author
|
Are we happy with the class placement? Maybe we should add a new align.multiple package? |
Such sequences better belong in display code than in the model. They have been moved to a new MultipleAlignmentTools utility class.
Packing by columns is needed instead of by rows.
The transformation calculated in AFPChain was not copied. Now the information is converted into a Matrix4D and copied.
Bug fixes, Class renames and Code organization
The sequence alignment method has been improved to introduce a gap between blocks. Many more output conversions need to be implemented (Web, Aligned Pairs, etc.)
The alignment panel is fully functional, and the sequence alignment to Jmol connection is now possible for MultipleAlignments.
It was only used for problems loading the Atom arrays and to check consistency in some parts of the calculations, but the usage was not clearly defined. The exceptions have been replaced by NullPointerException and IllegalStateException, respectively, and since they are Java runtime exceptions they do not need to be thrown. Because there was no need to catch these exceptions, so they needed to be thrown always, the change does not affect the behavior of the code, but simplifies it.
FatCat result, Alignment Residues (as Pairs) and FASTA format.
Two tests to check the correctness of the MultipleAlignment DS have been implemented. Some bugs have been detected and fixed in the code while writting the tests.
Implementations for MultipleAlignment DS
lafita
added a commit
that referenced
this pull request
Jun 16, 2015
Multiple Structure Alignment Datastructures
lafita
added a commit
that referenced
this pull request
Jun 17, 2015
The ReferenceSuperimposer now can calculate the transformation of each individual BlockSet in case there are several. A bug with the MultipleAlignment clone() method has been fixed (the BlockSets and Blocks were not added to the parent Lists). Improve documentation of the DataStructure.
lafita
added a commit
to lafita/biojava
that referenced
this pull request
Jun 25, 2015
Parameters, StartupParameters and UserArgumentProcessor classes. The old CEMC classed have been renamed to a more general, since the new version supports any pairwise algorithm to generate the seed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduces new data structures for structure alignments, created along with @lafita. The data structure can represent standard pairwise alignments, but also multiple alignments, flexible alignments, and non-topological alignments (#126).
The structure consists of a hierarchy of objects:
Some documentation still needs to be written and will be added to the cookbook.
A few other design decisions bear mention:
Stringbut will change toStructureIdentifierfollowing the completion of Make loading of structures more consistent #81.Matrix4dobjects. To support flexible alignments, the definitive matrices are stored in eachBlockSet. However, a default matrix can be stored inMultipleAlignmentto save memory for rigid alignments.AFPChaincan be converted directly toMultipleAlignmentEnsembleThis pull request also bundles concurrent development of:
AtomCache.getRepresentativeAtoms()method (that should replacegetAtoms()everywhere)etc. etc. !
This is a fairly major feature addition, so I'll leave this request open for a few days to allow comments.