Skip to content

Commit 7d1eb06

Browse files
dicknetherlandsandreasprlic
authored andcommitted
New page: This page is a work-in-progress, describing each of the key areas in which you might want to work with the new BioJava3 code. It is structured in the form of use-cases and is not a compreh...
1 parent 5f465b9 commit 7d1eb06

File tree

2 files changed

+332
-0
lines changed

2 files changed

+332
-0
lines changed

_wikis/BioJava3:HowTo.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
---
2+
title: BioJava3:HowTo
3+
---
4+
5+
This page is a work-in-progress, describing each of the key areas in
6+
which you might want to work with the new BioJava3 code. It is
7+
structured in the form of use-cases and is not a comprehensive resource.
8+
Sections will be added and updated as new modules are added and existing
9+
ones developed in more detail.
10+
11+
Symbols and Alphabets
12+
=====================
13+
14+
A DNA sequence
15+
--------------
16+
17+
All the examples in this section require the biojava-dna module.
18+
19+
### Construction and basic manipulation
20+
21+
` import org.biojava.dna.DNATools; // Executes static methods to set up the DNA alphabet.`
22+
` `
23+
` String mySeqString = "ATCGatcgATCG"; // Note that you can use mixed-case strings.`
24+
` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList(mySeqString);`
25+
` `
26+
` // Is it a big list? Don't want to hold it all in memory? Use an iterator instead.`
27+
` Iterator`<Symbol>` myIterator = SymbolListFormater.parseSymbols(mySeqString);`
28+
` while (myIterator.hasNext()) {`
29+
`   Symbol sym = myIterator.next();`
30+
` }`
31+
`  `
32+
` // You can now use any List method, from Java Collections, to manipulate the list of bases.`
33+
` `
34+
` // The List returned is actually a SymbolList, you can cast it to get some bio-specific`
35+
` // functions that work with 1-indexed positions as opposed to Java's default 0-indexed positions.`
36+
` `
37+
` SymbolList symList = (SymbolList)mySeq;  `
38+
` Symbol symA = symList.get(0); // The first symbol, List-style.`
39+
` Symbol symB = symList.get_bio(1) ; // The first symbol, bio-style. `
40+
` if (symA==symB) { // Symbols are singletons, so == will work if they are identical including case.`
41+
`   System.out.println("Identical!");`
42+
` }`
43+
` `
44+
` // Instead of using equals() or == to compare symbols, use the alphabet of your choice to`
45+
` // compare them in multiple ways. It will return different values depending on whether one`
46+
` // is a gap and the other isn't, whether they match exactly, or if they're the same symbol`
47+
` // but in a different case, etc.`
48+
` Alphabet dna = DNATools.DNA_ALPHABET;`
49+
` SymbolMatchType matchType = dna.getSymbolMatchType(Symbol.get("A"), Symbol.get("a"));`
50+
51+
### Reversing and Complementing DNA
52+
53+
` // All methods in this section modify the list in-place.`
54+
` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");`
55+
` `
56+
` // Reverse.`
57+
` // Method A.`
58+
` Collections.reverse(mySeq); // Using Java Collections.`
59+
` // Method B.`
60+
` DNATools.reverse(mySeq); // DNATools-style.`
61+
` `
62+
` // Complement.`
63+
` DNATools.complement(mySeq);`
64+
` `
65+
` // Reverse-complement.`
66+
` DNATools.reverseComplement(mySeq);`
67+
`   `
68+
` // Reverse only the third and fourth bases, 0-indexed list style?`
69+
` Collections.reverse(mySeq.subList(2,4)); // Java Collections API.`
70+
`   `
71+
` // Do the same, 1-indexed bio style?`
72+
` Collections.reverse(mySeq.subList_bio(3,5));`
73+
74+
### Editing the sequence
75+
76+
` // Delete the second and third bases.`
77+
` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");`
78+
` mySeq.subList(1,3).clear();`
79+
` `
80+
` // Remove only 2nd base, bio-style.`
81+
` mySeq.remove_bio(2);`
82+
` `
83+
` // Get another sequence and insert it after the 1st base.`
84+
` List`<Symbol>` otherSeq = SymbolListFormatter.parseSymbolList("GGGG");`
85+
` mySeq.addAll(1, otherSeq);`
86+
87+
A quality-scored DNA sequence
88+
-----------------------------
89+
90+
### Constructing a quality-scored DNA sequence
91+
92+
` // Construct a default unscored DNA sequence with capacity for integer scoring.`
93+
` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");`
94+
` TaggedSymbolList`<Integer>` scoredSeq = new TaggedSymbolList`<Integer>`(mySeq);`
95+
` `
96+
` // Tag all the bases with the same score of 5.`
97+
` scoredSeq.setTagRange(0, scoredSeq.length(), 5);`
98+
` `
99+
` // Tag just the 3rd base (0-indexed) with a score of 3.`
100+
` scoredSeq.setTag(2, 3);`
101+
` `
102+
` // Do the same, 1-indexed.`
103+
` scoredSeq.setTag_bio(3, 3);`
104+
` `
105+
` // Get the score at base 4, 1-indexed.`
106+
` Integer tag = scoredSeq.getTag_bio(4);`
107+
108+
### Iterating over the base/score pairs
109+
110+
` // A 1-indexed iterator and ListIterators are also available.`
111+
` Iterator`<TaggedSymbol<Integer>`> iter = scoredSeq.taggedSymbolIterator();`
112+
` while (iter.hasNext()) {`
113+
`   TaggedSymbol`<Integer>` taggedSym = iter.next();`
114+
`   Symbol sym = taggedSym.getSymbol();`
115+
`   Integer score = taggedSym.getTag();`
116+
`   // Change the score whilst we're at it.`
117+
`   taggedSym.setTag(6); // Updates the score to 6 in the original set of tagged scores.`
118+
` }`
119+
120+
### Iterating over the bases only
121+
122+
` // Use the default iterator.`
123+
` // A ListIterator is also available, as are 1-indexed iterators.`
124+
` Iterator`<Symbol>` iter = scoredSeq.iterator();`
125+
126+
### Iterating over the scores only
127+
128+
` // A ListIterator is also available, as are 1-indexed iterators.`
129+
` Iterator`<Integer>` iter = scoredSeq.tagIterator();`
130+
` while (iter.hasNext()) {`
131+
`   Integer score = iter.next();`
132+
` }`
133+
134+
File parsing and converting
135+
===========================
136+
137+
FASTA
138+
-----
139+
140+
The examples in this section require the biojava-fasta module. The
141+
examples that deal with converting to/from DNA sequences also require
142+
the biojava-dna module.
143+
144+
At some point, convenience wrapper classes will be added to make the
145+
parsing process simpler for the most common use-cases.
146+
147+
### Parsing a FASTA file
148+
149+
` FASTAReader reader = new FASTAFileReader(new File("/path/to/my/fasta.fa"));`
150+
` FASTABuilder builder = new FASTABuilder();`
151+
` ThingParser`<FASTA>` parser = new ThingParser`<FASTA>`(reader, builder);`
152+
` while (parser.hasNext()) {`
153+
`   FASTA fasta = parser.next(); `
154+
`   // fasta contains a complete FASTA record.`
155+
` }`
156+
` reader.close();`
157+
158+
### Converting the FASTA sequence into DNA sequence
159+
160+
` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList(fasta.getSequence());`
161+
162+
### Converting a DNA sequence back into FASTA
163+
164+
` FASTA fasta = new FASTA();`
165+
` fasta.setDescription("My Description Line");`
166+
` fasta.setSequence(SymbolListFormatter.formatSymbols(mySeq));`
167+
168+
### Writing a FASTA file
169+
170+
` FASTAEmitter emitter = new FASTAEmitter(fasta);`
171+
` FASTAWriter writer = new FASTAFileWriter(new File("/path/to/new/fasta.fa"));`
172+
` ThingParser`<FASTA>` parser = new ThingParser`<FASTA>`(emitter, writer);`
173+
` parser.parseAll();`
174+
` writer.close();`

_wikis/BioJava3:HowTo.mediawiki

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
This page is a work-in-progress, describing each of the key areas in which you might want to work with the new BioJava3 code. It is structured in the form of use-cases and is not a comprehensive resource. Sections will be added and updated as new modules are added and existing ones developed in more detail.
2+
3+
= Symbols and Alphabets =
4+
5+
== A DNA sequence ==
6+
7+
All the examples in this section require the biojava-dna module.
8+
9+
=== Construction and basic manipulation ===
10+
11+
import org.biojava.dna.DNATools; // Executes static methods to set up the DNA alphabet.
12+
13+
String mySeqString = "ATCGatcgATCG"; // Note that you can use mixed-case strings.
14+
List<Symbol> mySeq = SymbolListFormatter.parseSymbolList(mySeqString);
15+
16+
// Is it a big list? Don't want to hold it all in memory? Use an iterator instead.
17+
Iterator<Symbol> myIterator = SymbolListFormater.parseSymbols(mySeqString);
18+
while (myIterator.hasNext()) {
19+
Symbol sym = myIterator.next();
20+
}
21+
22+
// You can now use any List method, from Java Collections, to manipulate the list of bases.
23+
24+
// The List returned is actually a SymbolList, you can cast it to get some bio-specific
25+
// functions that work with 1-indexed positions as opposed to Java's default 0-indexed positions.
26+
27+
SymbolList symList = (SymbolList)mySeq;
28+
Symbol symA = symList.get(0); // The first symbol, List-style.
29+
Symbol symB = symList.get_bio(1) ; // The first symbol, bio-style.
30+
if (symA==symB) { // Symbols are singletons, so == will work if they are identical including case.
31+
System.out.println("Identical!");
32+
}
33+
34+
// Instead of using equals() or == to compare symbols, use the alphabet of your choice to
35+
// compare them in multiple ways. It will return different values depending on whether one
36+
// is a gap and the other isn't, whether they match exactly, or if they're the same symbol
37+
// but in a different case, etc.
38+
Alphabet dna = DNATools.DNA_ALPHABET;
39+
SymbolMatchType matchType = dna.getSymbolMatchType(Symbol.get("A"), Symbol.get("a"));
40+
41+
=== Reversing and Complementing DNA ===
42+
43+
// All methods in this section modify the list in-place.
44+
List<Symbol> mySeq = SymbolListFormatter.parseSymbolList("ATCG");
45+
46+
// Reverse.
47+
// Method A.
48+
Collections.reverse(mySeq); // Using Java Collections.
49+
// Method B.
50+
DNATools.reverse(mySeq); // DNATools-style.
51+
52+
// Complement.
53+
DNATools.complement(mySeq);
54+
55+
// Reverse-complement.
56+
DNATools.reverseComplement(mySeq);
57+
58+
// Reverse only the third and fourth bases, 0-indexed list style?
59+
Collections.reverse(mySeq.subList(2,4)); // Java Collections API.
60+
61+
// Do the same, 1-indexed bio style?
62+
Collections.reverse(mySeq.subList_bio(3,5));
63+
64+
=== Editing the sequence ===
65+
66+
// Delete the second and third bases.
67+
List<Symbol> mySeq = SymbolListFormatter.parseSymbolList("ATCG");
68+
mySeq.subList(1,3).clear();
69+
70+
// Remove only 2nd base, bio-style.
71+
mySeq.remove_bio(2);
72+
73+
// Get another sequence and insert it after the 1st base.
74+
List<Symbol> otherSeq = SymbolListFormatter.parseSymbolList("GGGG");
75+
mySeq.addAll(1, otherSeq);
76+
77+
== A quality-scored DNA sequence ==
78+
79+
=== Constructing a quality-scored DNA sequence ===
80+
81+
// Construct a default unscored DNA sequence with capacity for integer scoring.
82+
List<Symbol> mySeq = SymbolListFormatter.parseSymbolList("ATCG");
83+
TaggedSymbolList<Integer> scoredSeq = new TaggedSymbolList<Integer>(mySeq);
84+
85+
// Tag all the bases with the same score of 5.
86+
scoredSeq.setTagRange(0, scoredSeq.length(), 5);
87+
88+
// Tag just the 3rd base (0-indexed) with a score of 3.
89+
scoredSeq.setTag(2, 3);
90+
91+
// Do the same, 1-indexed.
92+
scoredSeq.setTag_bio(3, 3);
93+
94+
// Get the score at base 4, 1-indexed.
95+
Integer tag = scoredSeq.getTag_bio(4);
96+
97+
=== Iterating over the base/score pairs ===
98+
99+
// A 1-indexed iterator and ListIterators are also available.
100+
Iterator<TaggedSymbol<Integer>> iter = scoredSeq.taggedSymbolIterator();
101+
while (iter.hasNext()) {
102+
TaggedSymbol<Integer> taggedSym = iter.next();
103+
Symbol sym = taggedSym.getSymbol();
104+
Integer score = taggedSym.getTag();
105+
// Change the score whilst we're at it.
106+
taggedSym.setTag(6); // Updates the score to 6 in the original set of tagged scores.
107+
}
108+
109+
=== Iterating over the bases only ===
110+
111+
// Use the default iterator.
112+
// A ListIterator is also available, as are 1-indexed iterators.
113+
Iterator<Symbol> iter = scoredSeq.iterator();
114+
115+
=== Iterating over the scores only ===
116+
117+
// A ListIterator is also available, as are 1-indexed iterators.
118+
Iterator<Integer> iter = scoredSeq.tagIterator();
119+
while (iter.hasNext()) {
120+
Integer score = iter.next();
121+
}
122+
123+
= File parsing and converting =
124+
125+
== FASTA ==
126+
127+
The examples in this section require the biojava-fasta module. The examples that deal with converting to/from DNA sequences also require the biojava-dna module.
128+
129+
At some point, convenience wrapper classes will be added to make the parsing process simpler for the most common use-cases.
130+
131+
=== Parsing a FASTA file ===
132+
133+
FASTAReader reader = new FASTAFileReader(new File("/path/to/my/fasta.fa"));
134+
FASTABuilder builder = new FASTABuilder();
135+
ThingParser<FASTA> parser = new ThingParser<FASTA>(reader, builder);
136+
while (parser.hasNext()) {
137+
FASTA fasta = parser.next();
138+
// fasta contains a complete FASTA record.
139+
}
140+
reader.close();
141+
142+
=== Converting the FASTA sequence into DNA sequence ===
143+
144+
List<Symbol> mySeq = SymbolListFormatter.parseSymbolList(fasta.getSequence());
145+
146+
=== Converting a DNA sequence back into FASTA ===
147+
148+
FASTA fasta = new FASTA();
149+
fasta.setDescription("My Description Line");
150+
fasta.setSequence(SymbolListFormatter.formatSymbols(mySeq));
151+
152+
=== Writing a FASTA file ===
153+
154+
FASTAEmitter emitter = new FASTAEmitter(fasta);
155+
FASTAWriter writer = new FASTAFileWriter(new File("/path/to/new/fasta.fa"));
156+
ThingParser<FASTA> parser = new ThingParser<FASTA>(emitter, writer);
157+
parser.parseAll();
158+
writer.close();

0 commit comments

Comments
 (0)