|
| 1 | +--- |
| 2 | +title: BioJava3:HowTo |
| 3 | +--- |
| 4 | + |
| 5 | +This page is a work-in-progress, describing each of the key areas in |
| 6 | +which you might want to work with the new BioJava3 code. It is |
| 7 | +structured in the form of use-cases and is not a comprehensive resource. |
| 8 | +Sections will be added and updated as new modules are added and existing |
| 9 | +ones developed in more detail. |
| 10 | + |
| 11 | +Symbols and Alphabets |
| 12 | +===================== |
| 13 | + |
| 14 | +A DNA sequence |
| 15 | +-------------- |
| 16 | + |
| 17 | +All the examples in this section require the biojava-dna module. |
| 18 | + |
| 19 | +### Construction and basic manipulation |
| 20 | + |
| 21 | +` import org.biojava.dna.DNATools; // Executes static methods to set up the DNA alphabet.` |
| 22 | +` ` |
| 23 | +` String mySeqString = "ATCGatcgATCG"; // Note that you can use mixed-case strings.` |
| 24 | +` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList(mySeqString);` |
| 25 | +` ` |
| 26 | +` // Is it a big list? Don't want to hold it all in memory? Use an iterator instead.` |
| 27 | +` Iterator`<Symbol>` myIterator = SymbolListFormater.parseSymbols(mySeqString);` |
| 28 | +` while (myIterator.hasNext()) {` |
| 29 | +` Symbol sym = myIterator.next();` |
| 30 | +` }` |
| 31 | +` ` |
| 32 | +` // You can now use any List method, from Java Collections, to manipulate the list of bases.` |
| 33 | +` ` |
| 34 | +` // The List returned is actually a SymbolList, you can cast it to get some bio-specific` |
| 35 | +` // functions that work with 1-indexed positions as opposed to Java's default 0-indexed positions.` |
| 36 | +` ` |
| 37 | +` SymbolList symList = (SymbolList)mySeq; ` |
| 38 | +` Symbol symA = symList.get(0); // The first symbol, List-style.` |
| 39 | +` Symbol symB = symList.get_bio(1) ; // The first symbol, bio-style. ` |
| 40 | +` if (symA==symB) { // Symbols are singletons, so == will work if they are identical including case.` |
| 41 | +` System.out.println("Identical!");` |
| 42 | +` }` |
| 43 | +` ` |
| 44 | +` // Instead of using equals() or == to compare symbols, use the alphabet of your choice to` |
| 45 | +` // compare them in multiple ways. It will return different values depending on whether one` |
| 46 | +` // is a gap and the other isn't, whether they match exactly, or if they're the same symbol` |
| 47 | +` // but in a different case, etc.` |
| 48 | +` Alphabet dna = DNATools.DNA_ALPHABET;` |
| 49 | +` SymbolMatchType matchType = dna.getSymbolMatchType(Symbol.get("A"), Symbol.get("a"));` |
| 50 | + |
| 51 | +### Reversing and Complementing DNA |
| 52 | + |
| 53 | +` // All methods in this section modify the list in-place.` |
| 54 | +` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");` |
| 55 | +` ` |
| 56 | +` // Reverse.` |
| 57 | +` // Method A.` |
| 58 | +` Collections.reverse(mySeq); // Using Java Collections.` |
| 59 | +` // Method B.` |
| 60 | +` DNATools.reverse(mySeq); // DNATools-style.` |
| 61 | +` ` |
| 62 | +` // Complement.` |
| 63 | +` DNATools.complement(mySeq);` |
| 64 | +` ` |
| 65 | +` // Reverse-complement.` |
| 66 | +` DNATools.reverseComplement(mySeq);` |
| 67 | +` ` |
| 68 | +` // Reverse only the third and fourth bases, 0-indexed list style?` |
| 69 | +` Collections.reverse(mySeq.subList(2,4)); // Java Collections API.` |
| 70 | +` ` |
| 71 | +` // Do the same, 1-indexed bio style?` |
| 72 | +` Collections.reverse(mySeq.subList_bio(3,5));` |
| 73 | + |
| 74 | +### Editing the sequence |
| 75 | + |
| 76 | +` // Delete the second and third bases.` |
| 77 | +` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");` |
| 78 | +` mySeq.subList(1,3).clear();` |
| 79 | +` ` |
| 80 | +` // Remove only 2nd base, bio-style.` |
| 81 | +` mySeq.remove_bio(2);` |
| 82 | +` ` |
| 83 | +` // Get another sequence and insert it after the 1st base.` |
| 84 | +` List`<Symbol>` otherSeq = SymbolListFormatter.parseSymbolList("GGGG");` |
| 85 | +` mySeq.addAll(1, otherSeq);` |
| 86 | + |
| 87 | +A quality-scored DNA sequence |
| 88 | +----------------------------- |
| 89 | + |
| 90 | +### Constructing a quality-scored DNA sequence |
| 91 | + |
| 92 | +` // Construct a default unscored DNA sequence with capacity for integer scoring.` |
| 93 | +` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");` |
| 94 | +` TaggedSymbolList`<Integer>` scoredSeq = new TaggedSymbolList`<Integer>`(mySeq);` |
| 95 | +` ` |
| 96 | +` // Tag all the bases with the same score of 5.` |
| 97 | +` scoredSeq.setTagRange(0, scoredSeq.length(), 5);` |
| 98 | +` ` |
| 99 | +` // Tag just the 3rd base (0-indexed) with a score of 3.` |
| 100 | +` scoredSeq.setTag(2, 3);` |
| 101 | +` ` |
| 102 | +` // Do the same, 1-indexed.` |
| 103 | +` scoredSeq.setTag_bio(3, 3);` |
| 104 | +` ` |
| 105 | +` // Get the score at base 4, 1-indexed.` |
| 106 | +` Integer tag = scoredSeq.getTag_bio(4);` |
| 107 | + |
| 108 | +### Iterating over the base/score pairs |
| 109 | + |
| 110 | +` // A 1-indexed iterator and ListIterators are also available.` |
| 111 | +` Iterator`<TaggedSymbol<Integer>`> iter = scoredSeq.taggedSymbolIterator();` |
| 112 | +` while (iter.hasNext()) {` |
| 113 | +` TaggedSymbol`<Integer>` taggedSym = iter.next();` |
| 114 | +` Symbol sym = taggedSym.getSymbol();` |
| 115 | +` Integer score = taggedSym.getTag();` |
| 116 | +` // Change the score whilst we're at it.` |
| 117 | +` taggedSym.setTag(6); // Updates the score to 6 in the original set of tagged scores.` |
| 118 | +` }` |
| 119 | + |
| 120 | +### Iterating over the bases only |
| 121 | + |
| 122 | +` // Use the default iterator.` |
| 123 | +` // A ListIterator is also available, as are 1-indexed iterators.` |
| 124 | +` Iterator`<Symbol>` iter = scoredSeq.iterator();` |
| 125 | + |
| 126 | +### Iterating over the scores only |
| 127 | + |
| 128 | +` // A ListIterator is also available, as are 1-indexed iterators.` |
| 129 | +` Iterator`<Integer>` iter = scoredSeq.tagIterator();` |
| 130 | +` while (iter.hasNext()) {` |
| 131 | +` Integer score = iter.next();` |
| 132 | +` }` |
| 133 | + |
| 134 | +File parsing and converting |
| 135 | +=========================== |
| 136 | + |
| 137 | +FASTA |
| 138 | +----- |
| 139 | + |
| 140 | +The examples in this section require the biojava-fasta module. The |
| 141 | +examples that deal with converting to/from DNA sequences also require |
| 142 | +the biojava-dna module. |
| 143 | + |
| 144 | +At some point, convenience wrapper classes will be added to make the |
| 145 | +parsing process simpler for the most common use-cases. |
| 146 | + |
| 147 | +### Parsing a FASTA file |
| 148 | + |
| 149 | +` FASTAReader reader = new FASTAFileReader(new File("/path/to/my/fasta.fa"));` |
| 150 | +` FASTABuilder builder = new FASTABuilder();` |
| 151 | +` ThingParser`<FASTA>` parser = new ThingParser`<FASTA>`(reader, builder);` |
| 152 | +` while (parser.hasNext()) {` |
| 153 | +` FASTA fasta = parser.next(); ` |
| 154 | +` // fasta contains a complete FASTA record.` |
| 155 | +` }` |
| 156 | +` reader.close();` |
| 157 | + |
| 158 | +### Converting the FASTA sequence into DNA sequence |
| 159 | + |
| 160 | +` List`<Symbol>` mySeq = SymbolListFormatter.parseSymbolList(fasta.getSequence());` |
| 161 | + |
| 162 | +### Converting a DNA sequence back into FASTA |
| 163 | + |
| 164 | +` FASTA fasta = new FASTA();` |
| 165 | +` fasta.setDescription("My Description Line");` |
| 166 | +` fasta.setSequence(SymbolListFormatter.formatSymbols(mySeq));` |
| 167 | + |
| 168 | +### Writing a FASTA file |
| 169 | + |
| 170 | +` FASTAEmitter emitter = new FASTAEmitter(fasta);` |
| 171 | +` FASTAWriter writer = new FASTAFileWriter(new File("/path/to/new/fasta.fa"));` |
| 172 | +` ThingParser`<FASTA>` parser = new ThingParser`<FASTA>`(emitter, writer);` |
| 173 | +` parser.parseAll();` |
| 174 | +` writer.close();` |
0 commit comments