You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|RichSequenceBuilderFactor.FACTORY | Does not attempt any compression on sequence data. |
671
+
| RichSequenceBuilderFactor.PACKED | Will compress all sequence data using PackedSymbolLists. |
672
+
| RichSequenceBuilderFactor.THRESHOLD | Will compress sequence data using a PackedSymbolList only when the sequence exceeds 5000 bases in length. Otherwise, data is not compressed.|
673
673
674
674
If you set the namespace to null, then the namespace used will depend on
675
675
the format you are reading. For formats which specify namespaces, the
@@ -685,22 +685,16 @@ classes will have similar methods.
685
685
686
686
For an alphabet which does not have a tools class, you can do this:
687
687
688
-
Alphabet a = ...; // get an alphabet instance from somewhere
689
-
SymbolTokenization st = a.getTokenization("token");
688
+
<java> Alphabet a = ...; // get an alphabet instance from somewhere
689
+
SymbolTokenization st = a.getTokenization("token"); </java>
690
690
691
691
#### Writing using RichStreamWriter
692
692
693
693
File output is done using RichStreamWriter. This requires:
694
694
695
-
` 1.`
696
-
697
-
` an OutputStream to write sequences to.`
698
-
` 2.`
699
-
700
-
` a Namespace to use for the sequences.`
701
-
` 3.`
702
-
703
-
` a RichSequenceIterator that provides the sequences to write.`
695
+
1. An OutputStream to write sequences to.
696
+
2. A Namespace to use for the sequences.
697
+
3. A RichSequenceIterator that provides the sequences to write.
704
698
705
699
The namespace should only be specified when the file format includes
706
700
namespace information and you wish to override the information
@@ -716,38 +710,36 @@ convert data from one file format to another with no intermediate steps.
716
710
If you only have one sequence to write, you can wrap it in a temporary
717
711
RichSequenceIterator by using a call like this:
718
712
719
-
RichSequence rs = ...; // get sequence from somewhere
713
+
<java> RichSequence rs = ...; // get sequence from somewhere
720
714
RichSequenceIterator it = new SingleRichSeqIterator(rs); // wrap it in
721
-
an iterator
715
+
an iterator </java>
722
716
723
717
#### Example
724
718
725
719
The following is an example that will read some DNA sequences from a
726
720
GenBank file and write them out to standard output (screen) as FASTA
727
721
using the methods outlined above:
728
722
729
-
SymbolTokenization dna = DNATools.getDNA().getTokenization("token"); //
730
-
sequences will be DNA sequences
731
-
732
-
RichSequenceFormat genbank = new GenbankFormat(); // read Genbank
733
-
RichSequenceFormat fasta = new FastaFormat(); // write FASTA
734
-
RichSequenceBuilderFactory factory =
735
-
RichSequenceBuilderFactory.THRESHOLD; // compress only longer sequences
736
-
Namespace bloggsNS = RichObjectFactory.getObject(
723
+
<java> // sequences will be DNA sequences SymbolTokenization dna =
Copy file name to clipboardExpand all lines: _wikis/BioJava:BioJavaXDocs.mediawiki
+48-25Lines changed: 48 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -475,10 +475,14 @@ while (seqs.hasNext()) {
475
475
If you wish to output a number of sequences in one of the XML formats, you have to pass a RichSequenceIterator over your collection of sequences in order for the XML format to group them together into a single file with the correct headers:
476
476
477
477
<java>
478
-
BufferedReader br = new BufferedReader(new FileReader("myGenbank.gbk")); // an input GenBank file
479
-
Namespace ns = RichObjectFactory.getDefaultNamespace(); // a namespace to override that in the file
480
-
RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(br,ns); // we are reading DNA sequences
481
-
RichSequence.IOTools.writeEMBLxml(System.out, seqs, ns); // write the whole lot in EMBLxml format to standard out
478
+
// an input GenBank file
479
+
BufferedReader br = new BufferedReader(new FileReader("myGenbank.gbk"));
If you don't know what format your input file is in, but know it could be one of a fixed set of acceptable formats, then you can use BioJavaX's format-guessing routine to attempt to read it:
@@ -512,62 +516,81 @@ To construct a RichStreamReader, you will need five things.
512
516
The RichSequenceBuilderFactory is best set to one of the predefined constants in the RichSequenceBuilderFactory interface. These constants are defined as:
RichSequenceBuilderFactor.FACTORY Does not attempt any compression on sequence data.
516
-
RichSequenceBuilderFactor.PACKED Will compress all sequence data using PackedSymbolLists.
517
-
RichSequenceBuilderFactor.THRESHOLD Will compress sequence data using a PackedSymbolList only when the sequence exceeds 5000 bases in length. Otherwise, data is not compressed.
519
+
{|border="1" cellpadding="2"
520
+
!width="200"|Name of constant
521
+
!width="400"|What it will do
522
+
|-
523
+
|RichSequenceBuilderFactor.FACTORY
524
+
|Does not attempt any compression on sequence data.
525
+
|-
526
+
|RichSequenceBuilderFactor.PACKED
527
+
|Will compress all sequence data using PackedSymbolLists.
528
+
|-
529
+
|RichSequenceBuilderFactor.THRESHOLD
530
+
|Will compress sequence data using a PackedSymbolList only when the sequence exceeds 5000 bases in length. Otherwise, data is not compressed.
531
+
|}
518
532
519
533
If you set the namespace to null, then the namespace used will depend on the format you are reading. For formats which specify namespaces, the namespace from the file will be used. For formats which do not specify namespaces, the default namespace provided by RichObjectFactory.getDefaultNamespace() will be used.
520
534
521
535
The SymbolTokenization should be obtained from the Alphabet that represents the sequence data you are expecting from the file. If you are reading DNA sequences, you should use DNATools.getDNA().getTokenization("token"). Other alphabets with tools classes will have similar methods.
522
536
523
537
For an alphabet which does not have a tools class, you can do this:
524
538
539
+
<java>
525
540
Alphabet a = ...; // get an alphabet instance from somewhere
526
541
SymbolTokenization st = a.getTokenization("token");
542
+
</java>
527
543
528
544
==== Writing using RichStreamWriter ====
529
545
530
546
File output is done using RichStreamWriter. This requires:
531
547
532
-
1.
533
-
534
-
an OutputStream to write sequences to.
535
-
2.
536
-
537
-
a Namespace to use for the sequences.
538
-
3.
539
-
540
-
a RichSequenceIterator that provides the sequences to write.
548
+
<ol>
549
+
<li>An OutputStream to write sequences to.</li>
550
+
<li>A Namespace to use for the sequences.</li>
551
+
<li>A RichSequenceIterator that provides the sequences to write.</li>
552
+
</ol>
541
553
542
554
The namespace should only be specified when the file format includes namespace information and you wish to override the information associated with the actual sequences. If you do not wish to do this, just set it to null, and the namespace from each individual sequence will be used instead.
543
555
544
556
The RichSequenceIterator is an iterator over a set of sequences, exactly the same as the one returned by the RichStreamReader. It is therefore possible to plug a RichStreamReader directly into a RichStreamWriter and convert data from one file format to another with no intermediate steps.
545
557
546
558
If you only have one sequence to write, you can wrap it in a temporary RichSequenceIterator by using a call like this:
547
559
560
+
<java>
548
561
RichSequence rs = ...; // get sequence from somewhere
549
562
RichSequenceIterator it = new SingleRichSeqIterator(rs); // wrap it in an iterator
563
+
</java>
550
564
551
565
==== Example ====
552
566
553
567
The following is an example that will read some DNA sequences from a GenBank file and write them out to standard output (screen) as FASTA using the methods outlined above:
554
568
555
-
SymbolTokenization dna = DNATools.getDNA().getTokenization("token"); // sequences will be DNA sequences
556
-
557
-
RichSequenceFormat genbank = new GenbankFormat(); // read Genbank
558
-
RichSequenceFormat fasta = new FastaFormat(); // write FASTA
559
-
RichSequenceBuilderFactory factory = RichSequenceBuilderFactory.THRESHOLD; // compress only longer sequences
569
+
<java>
570
+
// sequences will be DNA sequences
571
+
SymbolTokenization dna = DNATools.getDNA().getTokenization("token");
0 commit comments