|
| 1 | +Protein Secondary Structure |
| 2 | +=========================== |
| 3 | + |
| 4 | +## What is Protein Secondary Structure? |
| 5 | + |
| 6 | +Protein secondary structure (SS) is the general three-dimensional form of local segments of proteins. |
| 7 | +Secondary structure can be formally defined by the pattern of hydrogen bonds of the protein |
| 8 | +(such as alpha helices and beta sheets) that are observed in an atomic-resolution structure. |
| 9 | + |
| 10 | +More specifically, the secondary structure is defined by the patterns of hydrogen bonds formed between |
| 11 | +amine hydrogen (-NH) and carbonyl oxygen (C=O) atoms contained in the backbone peptide bonds of the protein. |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +For more info see the Wikipedia article on [protein secondary structure] |
| 16 | +(https://en.wikipedia.org/wiki/Protein_secondary_structure). |
| 17 | + |
| 18 | +## Secondary Structure Annotation |
| 19 | + |
| 20 | +### Information Sources |
| 21 | + |
| 22 | +There are various ways to obtain the SS annotation of a protein structure: |
| 23 | + |
| 24 | +- **Authors assignment**: the authors of the structure describe the SS, usually identifying helices |
| 25 | +and beta-sheets, and they assign the corresponding type to each residue involved. The authors assignment |
| 26 | +can be found in the `PDB` and `mmCIF` file formats deposited in the PDB, and it can be parsed in **BioJava** |
| 27 | +when a `Structure` is loaded. |
| 28 | + |
| 29 | +- **Prediction from Atom coordinates**: there exist various programs to predict the SS of a protein. |
| 30 | +The algorithms use the atom coordinates of the aminoacids to detemine hydrogen bonds and geometrical patterns |
| 31 | +that define the different types of protein secondary structure. One of the first and most popular algorithms |
| 32 | +is `DSSP` (Dictionary of Secondary Structure of Proteins). **BioJava** has an implementation of the algorithm, |
| 33 | +written originally in C++, which will be described in the next section. |
| 34 | + |
| 35 | +- **Prediction from sequence**: Other algorithms use only the aminoacid sequence (primary structure) of the protein, |
| 36 | +nd predict the SS using the SS propensities of each aminoacid and multiple alignments with homologous sequences |
| 37 | +(i.e. [PSIPRED](http://bioinf.cs.ucl.ac.uk/psipred/)). At the moment **BioJava** does not have an implementation |
| 38 | +of this type, which would be more suitable for the sequence and alignment modules. |
| 39 | + |
| 40 | +### Secondary Structure Types |
| 41 | + |
| 42 | +Following the `DSSP` convention, **BioJava** defines 8 types of secondary structure: |
| 43 | + |
| 44 | + E = extended strand, participates in β ladder |
| 45 | + B = residue in isolated β-bridge |
| 46 | + H = α-helix |
| 47 | + G = 3-helix (3-10 helix) |
| 48 | + I = 5-helix (π-helix) |
| 49 | + T = hydrogen bonded turn |
| 50 | + S = bend |
| 51 | + _ = loop (any other type) |
| 52 | + |
| 53 | +## Parsing Secondary Structure in BioJava |
| 54 | + |
| 55 | +Currently there exist two alternatives to parse the secondary structure in **BioJava**: either from the PDB/mmCIF |
| 56 | +files of deposited structures (author assignment) or from the output file of a DSSP prediction. Both file types |
| 57 | +can be obtained from the PDB serevers, if available, so they can be automatically fetched by BioJava. |
| 58 | + |
| 59 | +As an example,you can find here the links of the structure **5PTI** to its |
| 60 | +[PDB file](http://www.rcsb.org/pdb/files/5PTI.pdb) (search for the HELIX and SHEET lines) and its |
| 61 | +[DSSP file](http://www.rcsb.org/pdb/files/5PTI.dssp). |
| 62 | + |
| 63 | +Note that the DSSP prediction output is more detailed and complete than the authors assignment. |
| 64 | +The choice of one or the other will depend on the use case. |
| 65 | + |
| 66 | +Below you can find some examples of how to parse and assign the SS of a `Structure`: |
| 67 | + |
| 68 | +```java |
| 69 | + String pdbID = "5pti"; |
| 70 | + FileParsingParameters params = new FileParsingParameters(); |
| 71 | + //Only change needed to the normal Structure loading |
| 72 | + params.setParseSecStruc(true); //this is false as DEFAULT |
| 73 | + |
| 74 | + AtomCache cache = new AtomCache(); |
| 75 | + cache.setFileParsingParams(params); |
| 76 | + |
| 77 | + //The loaded Structure contains the SS assigned |
| 78 | + Structure s = cache.getStructure(pdbID); |
| 79 | + |
| 80 | + //If the more detailed DSSP prediction is required call this afterwards |
| 81 | + DSSPParser.fetch(pdbID, s, true); //Second parameter true overrides the previous SS |
| 82 | +``` |
| 83 | + |
| 84 | +For more examples search in the **demo** package for `DemoLoadSecStruc`. |
| 85 | + |
| 86 | +## Prediction of Secondary Structure in BioJava |
| 87 | + |
| 88 | +### Algorithm |
| 89 | + |
| 90 | +The algorithm implemented in BioJava for the prediction of SS is `DSSP`. It is described in the paper from |
| 91 | +[Kabsch W. & Sander C. in 1983](http://onlinelibrary.wiley.com/doi/10.1002/bip.360221211/abstract) |
| 92 | +[](http://www.ncbi.nlm.nih.gov/pubmed/6667333). |
| 93 | +A brief explanation of the algorithm and the output format can be found |
| 94 | +[here](http://swift.cmbi.ru.nl/gv/dssp/DSSP_3.html). |
| 95 | + |
| 96 | +The interface is very easy: a single method, named *predict()*, calculates the SS and can assign it to the |
| 97 | +input Structure overriding any previous annotation, like in the DSSPParser. An example can be found below: |
| 98 | + |
| 99 | +```java |
| 100 | + String pdbID = "5pti"; |
| 101 | + AtomCache cache = new AtomCache(); |
| 102 | + |
| 103 | + //Load structure without any SS assignment |
| 104 | + Structure s = cache.getStructure(pdbID); |
| 105 | + |
| 106 | + //Predict and assign the SS of the Structure |
| 107 | + SecStrucPred ssp = new SecStrucPred(); //Instantiation needed |
| 108 | + ssp.predict(s, true); //true assigns the SS to the Structure |
| 109 | +``` |
| 110 | + |
| 111 | +BioJava Class: [org.biojava.nbio.structure.secstruc.SecStrucPred] |
| 112 | +(http://www.biojava.org/docs/api/org/biojava/nbio/structure/secstruc/SecStrucPred.html) |
| 113 | + |
| 114 | +### Storage and Data Structures |
| 115 | + |
| 116 | +Because there are different sources of SS annotation, the Sata Structure in **BioJava** that stores SS assignments |
| 117 | +has two levels. The top level `SecStrucInfo` is very general and only contains two properties: **assignment** |
| 118 | +(String describing the source of information) and **type** the SS type. |
| 119 | + |
| 120 | +However, there is an extended container `SecStrucState`, which is a subclass of `SecStrucInfo`, that stores |
| 121 | +all the information of the hydrogen bonding, turns, bends, etc. used for the SS prediction and present in the |
| 122 | +DSSP output file format. This information is only used in certain applications, and that is the reason for the |
| 123 | +more general `SecStrucInfo` class being used by default. |
| 124 | + |
| 125 | +In order to access the SS information of a `Structure`, the `SecStrucInfo` object needs to be obtained from the |
| 126 | +`Group` properties. Below you find an example of how to access and print residue by residue the SS information of |
| 127 | +a `Structure`: |
| 128 | + |
| 129 | +```java |
| 130 | + //This structure should have SS assigned (by any of the methods described) |
| 131 | + Structure s; |
| 132 | + |
| 133 | + for (Chain c : s.getChains()) { |
| 134 | + for (Group g: c.getAtomGroups()){ |
| 135 | + if (g.hasAminoAtoms()){ //Only AA store SS |
| 136 | + //Obtain the object that stores the SS |
| 137 | + SecStrucInfo ss = (SecStrucInfo) g.getProperty(Group.SEC_STRUC); |
| 138 | + //Print information: chain+resn+name+SS |
| 139 | + System.out.println(c.getChainID()+" "+ |
| 140 | + g.getResidueNumber()+" "+ |
| 141 | + g.getPDBName()+" -> "+ss); |
| 142 | + } |
| 143 | + } |
| 144 | + } |
| 145 | +``` |
| 146 | + |
| 147 | +### Output Formats |
| 148 | + |
| 149 | +Once the SS has been assigned (either loaded or predicted), there exist in **BioJava** some formats to visualize it: |
| 150 | + |
| 151 | +- **DSSP format**: the SS can be printed as a DSSP oputput file format, following the standards so that it can be |
| 152 | +parsed again. It is the safest way to serialize a SS annotation and recover it later, but it is probably the most |
| 153 | +complicated to visualize. |
| 154 | + |
| 155 | +<pre> |
| 156 | + # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA |
| 157 | + 1 1 A R 0 0 168 0, 0.0 54,-0.1 0, 0.0 5,-0.1 0.000 360.0 360.0 360.0 139.2 32.2 14.7 -11.8 |
| 158 | + 2 2 A P > - 0 0 45 0, 0.0 3,-1.8 0, 0.0 4,-0.3 -0.194 360.0-122.0 -61.4 144.9 34.9 13.6 -9.4 |
| 159 | + 3 3 A D G > S+ 0 0 122 1,-0.3 3,-1.6 2,-0.2 4,-0.2 0.790 108.3 71.4 -62.8 -28.5 35.8 10.0 -9.5 |
| 160 | + 4 4 A F G > S+ 0 0 26 1,-0.3 3,-1.7 2,-0.2 -1,-0.3 0.725 83.7 70.4 -64.1 -23.3 35.0 9.7 -5.9 |
| 161 | +</pre> |
| 162 | + |
| 163 | +- **FASTA format**: simple format that prints the SS type of each residue sequentially in the order of the aminoacids. |
| 164 | +It is the easiest to visualize, but the less informative of all. |
| 165 | + |
| 166 | +<pre> |
| 167 | +>5PTI_SS-annotation |
| 168 | + GGGGS S EEEEEEETTTTEEEEEEE SSS SS BSSHHHHHHHH |
| 169 | +</pre> |
| 170 | + |
| 171 | +- **Helix Summary**: similar to the FASTA format, but contain also information about the helical turns. |
| 172 | + |
| 173 | +<pre> |
| 174 | +3 turn: >>><<< |
| 175 | +4 turn: >444< >>>>XX<<<< |
| 176 | +5 turn: >5555< |
| 177 | +SS: GGGGS S EEEEEEETTTTEEEEEEE SSS SS BSSHHHHHHHH |
| 178 | +AA: RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA |
| 179 | +</pre> |
| 180 | + |
| 181 | +- **Secondary Structure Elements**: another way to visualize the SS annotation is by compacting those sequential residues that share the same SS type and assigning an ID to the range. In this way, a structure can be described by |
| 182 | +a collection of helices, strands, turns, etc. and each one of the elements can be identified by an ID (i.e. helix 1 (H1), beta-strand 6 (E6), etc). |
| 183 | + |
| 184 | +<pre> |
| 185 | +G1: 3 - 6 |
| 186 | +S1: 7 - 7 |
| 187 | +S2: 13 - 13 |
| 188 | +E1: 18 - 24 |
| 189 | +T1: 25 - 28 |
| 190 | +E2: 29 - 35 |
| 191 | +S3: 37 - 39 |
| 192 | +S4: 42 - 43 |
| 193 | +B1: 45 - 45 |
| 194 | +S5: 46 - 47 |
| 195 | +H1: 48 - 55 |
| 196 | +</pre> |
| 197 | + |
| 198 | +You can find examples of how to get the different file formats in the class `DemoSecStrucPred` in the **demo** |
| 199 | +package. |
| 200 | + |
| 201 | +<!--automatically generated footer--> |
| 202 | + |
| 203 | +--- |
| 204 | + |
| 205 | +Navigation: |
| 206 | +[Home](../README.md) |
| 207 | +| [Book 3: The Structure Modules](README.md) |
| 208 | +| Chapter 15 : Protein Secondary Structure |
| 209 | + |
| 210 | +Prev: [Chapter 14 : Protein Symmetry](symmetry.md) |
| 211 | + |
| 212 | +Next: [Chapter 17 : Special Cases](special.md) |
0 commit comments