Skip to content

Commit bd4bdfd

Browse files
committed
Merge pull request biojava#7 from lafita/secstruc
Create a chapter for Secondary Structure package
2 parents ce628de + 87f3d9d commit bd4bdfd

File tree

5 files changed

+223
-9
lines changed

5 files changed

+223
-9
lines changed

structure/README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,13 @@ Chapter 13 - Finding all Interfaces in Crystal: [Crystal Contacts](crystal-conta
5656

5757
Chapter 14 - [Protein Symmetry](symmetry.md)
5858

59-
Chapter 15 - Bonds
59+
Chapter 15 - [Protein Secondary Structure](secstruc.md)
6060

61-
Chapter 16 - [Special Cases](special.md)
61+
Chapter 16 - Bonds
6262

63-
Chapter 17 - [Lists](lists.md) of PDB IDs and PDB [Status Information](lists.md)
63+
Chapter 17 - [Special Cases](special.md)
64+
65+
Chapter 18 - [Lists](lists.md) of PDB IDs and PDB [Status Information](lists.md)
6466

6567

6668
### Author:

structure/lists.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,6 @@ The following provides information about the status of a PDB entry
2727
Navigation:
2828
[Home](../README.md)
2929
| [Book 3: The Structure Modules](README.md)
30-
| Chapter 17 : Status Information
30+
| Chapter 18 : Status Information
3131

32-
Prev: [Chapter 16 : Special Cases](special.md)
32+
Prev: [Chapter 17 : Special Cases](special.md)

structure/secstruc.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
Protein Secondary Structure
2+
===========================
3+
4+
## What is Protein Secondary Structure?
5+
6+
Protein secondary structure (SS) is the general three-dimensional form of local segments of proteins.
7+
Secondary structure can be formally defined by the pattern of hydrogen bonds of the protein
8+
(such as alpha helices and beta sheets) that are observed in an atomic-resolution structure.
9+
10+
More specifically, the secondary structure is defined by the patterns of hydrogen bonds formed between
11+
amine hydrogen (-NH) and carbonyl oxygen (C=O) atoms contained in the backbone peptide bonds of the protein.
12+
13+
![alpha-beta](http://oregonstate.edu/instruction/bi314/summer09/Fig-02-19-0.jpg)
14+
15+
For more info see the Wikipedia article on [protein secondary structure]
16+
(https://en.wikipedia.org/wiki/Protein_secondary_structure).
17+
18+
## Secondary Structure Annotation
19+
20+
### Information Sources
21+
22+
There are various ways to obtain the SS annotation of a protein structure:
23+
24+
- **Authors assignment**: the authors of the structure describe the SS, usually identifying helices
25+
and beta-sheets, and they assign the corresponding type to each residue involved. The authors assignment
26+
can be found in the `PDB` and `mmCIF` file formats deposited in the PDB, and it can be parsed in **BioJava**
27+
when a `Structure` is loaded.
28+
29+
- **Prediction from Atom coordinates**: there exist various programs to predict the SS of a protein.
30+
The algorithms use the atom coordinates of the aminoacids to detemine hydrogen bonds and geometrical patterns
31+
that define the different types of protein secondary structure. One of the first and most popular algorithms
32+
is `DSSP` (Dictionary of Secondary Structure of Proteins). **BioJava** has an implementation of the algorithm,
33+
written originally in C++, which will be described in the next section.
34+
35+
- **Prediction from sequence**: Other algorithms use only the aminoacid sequence (primary structure) of the protein,
36+
nd predict the SS using the SS propensities of each aminoacid and multiple alignments with homologous sequences
37+
(i.e. [PSIPRED](http://bioinf.cs.ucl.ac.uk/psipred/)). At the moment **BioJava** does not have an implementation
38+
of this type, which would be more suitable for the sequence and alignment modules.
39+
40+
### Secondary Structure Types
41+
42+
Following the `DSSP` convention, **BioJava** defines 8 types of secondary structure:
43+
44+
E = extended strand, participates in β ladder
45+
B = residue in isolated β-bridge
46+
H = α-helix
47+
G = 3-helix (3-10 helix)
48+
I = 5-helix (π-helix)
49+
T = hydrogen bonded turn
50+
S = bend
51+
_ = loop (any other type)
52+
53+
## Parsing Secondary Structure in BioJava
54+
55+
Currently there exist two alternatives to parse the secondary structure in **BioJava**: either from the PDB/mmCIF
56+
files of deposited structures (author assignment) or from the output file of a DSSP prediction. Both file types
57+
can be obtained from the PDB serevers, if available, so they can be automatically fetched by BioJava.
58+
59+
As an example,you can find here the links of the structure **5PTI** to its
60+
[PDB file](http://www.rcsb.org/pdb/files/5PTI.pdb) (search for the HELIX and SHEET lines) and its
61+
[DSSP file](http://www.rcsb.org/pdb/files/5PTI.dssp).
62+
63+
Note that the DSSP prediction output is more detailed and complete than the authors assignment.
64+
The choice of one or the other will depend on the use case.
65+
66+
Below you can find some examples of how to parse and assign the SS of a `Structure`:
67+
68+
```java
69+
String pdbID = "5pti";
70+
FileParsingParameters params = new FileParsingParameters();
71+
//Only change needed to the normal Structure loading
72+
params.setParseSecStruc(true); //this is false as DEFAULT
73+
74+
AtomCache cache = new AtomCache();
75+
cache.setFileParsingParams(params);
76+
77+
//The loaded Structure contains the SS assigned
78+
Structure s = cache.getStructure(pdbID);
79+
80+
//If the more detailed DSSP prediction is required call this afterwards
81+
DSSPParser.fetch(pdbID, s, true); //Second parameter true overrides the previous SS
82+
```
83+
84+
For more examples search in the **demo** package for `DemoLoadSecStruc`.
85+
86+
## Prediction of Secondary Structure in BioJava
87+
88+
### Algorithm
89+
90+
The algorithm implemented in BioJava for the prediction of SS is `DSSP`. It is described in the paper from
91+
[Kabsch W. & Sander C. in 1983](http://onlinelibrary.wiley.com/doi/10.1002/bip.360221211/abstract)
92+
[![pubmed](http://img.shields.io/badge/in-pubmed-blue.svg?style=flat)](http://www.ncbi.nlm.nih.gov/pubmed/6667333).
93+
A brief explanation of the algorithm and the output format can be found
94+
[here](http://swift.cmbi.ru.nl/gv/dssp/DSSP_3.html).
95+
96+
The interface is very easy: a single method, named *predict()*, calculates the SS and can assign it to the
97+
input Structure overriding any previous annotation, like in the DSSPParser. An example can be found below:
98+
99+
```java
100+
String pdbID = "5pti";
101+
AtomCache cache = new AtomCache();
102+
103+
//Load structure without any SS assignment
104+
Structure s = cache.getStructure(pdbID);
105+
106+
//Predict and assign the SS of the Structure
107+
SecStrucPred ssp = new SecStrucPred(); //Instantiation needed
108+
ssp.predict(s, true); //true assigns the SS to the Structure
109+
```
110+
111+
BioJava Class: [org.biojava.nbio.structure.secstruc.SecStrucPred]
112+
(http://www.biojava.org/docs/api/org/biojava/nbio/structure/secstruc/SecStrucPred.html)
113+
114+
### Storage and Data Structures
115+
116+
Because there are different sources of SS annotation, the Sata Structure in **BioJava** that stores SS assignments
117+
has two levels. The top level `SecStrucInfo` is very general and only contains two properties: **assignment**
118+
(String describing the source of information) and **type** the SS type.
119+
120+
However, there is an extended container `SecStrucState`, which is a subclass of `SecStrucInfo`, that stores
121+
all the information of the hydrogen bonding, turns, bends, etc. used for the SS prediction and present in the
122+
DSSP output file format. This information is only used in certain applications, and that is the reason for the
123+
more general `SecStrucInfo` class being used by default.
124+
125+
In order to access the SS information of a `Structure`, the `SecStrucInfo` object needs to be obtained from the
126+
`Group` properties. Below you find an example of how to access and print residue by residue the SS information of
127+
a `Structure`:
128+
129+
```java
130+
//This structure should have SS assigned (by any of the methods described)
131+
Structure s;
132+
133+
for (Chain c : s.getChains()) {
134+
for (Group g: c.getAtomGroups()){
135+
if (g.hasAminoAtoms()){ //Only AA store SS
136+
//Obtain the object that stores the SS
137+
SecStrucInfo ss = (SecStrucInfo) g.getProperty(Group.SEC_STRUC);
138+
//Print information: chain+resn+name+SS
139+
System.out.println(c.getChainID()+" "+
140+
g.getResidueNumber()+" "+
141+
g.getPDBName()+" -> "+ss);
142+
}
143+
}
144+
}
145+
```
146+
147+
### Output Formats
148+
149+
Once the SS has been assigned (either loaded or predicted), there exist in **BioJava** some formats to visualize it:
150+
151+
- **DSSP format**: the SS can be printed as a DSSP oputput file format, following the standards so that it can be
152+
parsed again. It is the safest way to serialize a SS annotation and recover it later, but it is probably the most
153+
complicated to visualize.
154+
155+
<pre>
156+
# RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA
157+
1 1 A R 0 0 168 0, 0.0 54,-0.1 0, 0.0 5,-0.1 0.000 360.0 360.0 360.0 139.2 32.2 14.7 -11.8
158+
2 2 A P > - 0 0 45 0, 0.0 3,-1.8 0, 0.0 4,-0.3 -0.194 360.0-122.0 -61.4 144.9 34.9 13.6 -9.4
159+
3 3 A D G > S+ 0 0 122 1,-0.3 3,-1.6 2,-0.2 4,-0.2 0.790 108.3 71.4 -62.8 -28.5 35.8 10.0 -9.5
160+
4 4 A F G > S+ 0 0 26 1,-0.3 3,-1.7 2,-0.2 -1,-0.3 0.725 83.7 70.4 -64.1 -23.3 35.0 9.7 -5.9
161+
</pre>
162+
163+
- **FASTA format**: simple format that prints the SS type of each residue sequentially in the order of the aminoacids.
164+
It is the easiest to visualize, but the less informative of all.
165+
166+
<pre>
167+
>5PTI_SS-annotation
168+
GGGGS S EEEEEEETTTTEEEEEEE SSS SS BSSHHHHHHHH
169+
</pre>
170+
171+
- **Helix Summary**: similar to the FASTA format, but contain also information about the helical turns.
172+
173+
<pre>
174+
3 turn: >>><<<
175+
4 turn: >444< >>>>XX<<<<
176+
5 turn: >5555<
177+
SS: GGGGS S EEEEEEETTTTEEEEEEE SSS SS BSSHHHHHHHH
178+
AA: RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA
179+
</pre>
180+
181+
- **Secondary Structure Elements**: another way to visualize the SS annotation is by compacting those sequential residues that share the same SS type and assigning an ID to the range. In this way, a structure can be described by
182+
a collection of helices, strands, turns, etc. and each one of the elements can be identified by an ID (i.e. helix 1 (H1), beta-strand 6 (E6), etc).
183+
184+
<pre>
185+
G1: 3 - 6
186+
S1: 7 - 7
187+
S2: 13 - 13
188+
E1: 18 - 24
189+
T1: 25 - 28
190+
E2: 29 - 35
191+
S3: 37 - 39
192+
S4: 42 - 43
193+
B1: 45 - 45
194+
S5: 46 - 47
195+
H1: 48 - 55
196+
</pre>
197+
198+
You can find examples of how to get the different file formats in the class `DemoSecStrucPred` in the **demo**
199+
package.
200+
201+
<!--automatically generated footer-->
202+
203+
---
204+
205+
Navigation:
206+
[Home](../README.md)
207+
| [Book 3: The Structure Modules](README.md)
208+
| Chapter 15 : Protein Secondary Structure
209+
210+
Prev: [Chapter 14 : Protein Symmetry](symmetry.md)
211+
212+
Next: [Chapter 17 : Special Cases](special.md)

structure/special.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,8 @@ DYG is an unusual group - it has 3 characters as a result of .getOne_letter_code
131131
Navigation:
132132
[Home](../README.md)
133133
| [Book 3: The Structure Modules](README.md)
134-
| Chapter 16 : Special Cases
134+
| Chapter 17 : Special Cases
135135

136-
Prev: [Chapter 14 : Protein Symmetry](symmetry.md)
136+
Prev: [Chapter 15 : Protein Secondary Structure](secstruc.md)
137137

138-
Next: [Chapter 17 : Status Information](lists.md)
138+
Next: [Chapter 18 : Status Information](lists.md)

structure/symmetry.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,4 +227,4 @@ Navigation:
227227

228228
Prev: [Chapter 13 - Finding all Interfaces in Crystal: Crystal Contacts](crystal-contacts.md)
229229

230-
Next: [Chapter 16 : Special Cases](special.md)
230+
Next: [Chapter 15 : Protein Secondary Structure](secstruc.md)

0 commit comments

Comments
 (0)