Skip to content

Commit 68110b7

Browse files
dicknetherlandsandreasprlic
authored andcommitted
/* Major problem areas */
1 parent 14e1b5c commit 68110b7

File tree

2 files changed

+12
-0
lines changed

2 files changed

+12
-0
lines changed

_wikis/BioJava3_Proposal.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,17 @@ Major problem areas
171171
exactly how BioSQL should be used, right down to details like this,
172172
so that all projects may be able to fully interact without knowledge
173173
of which tool was used to write the data to BioSQL.
174+
11. Gapped sequences and alignments need closer attention. Currently
175+
there are two ways - a SimpleSymbolList with '-' symbols, or a
176+
SimpleGappedSymbolList with proper block definitions and coordinate
177+
translation and access to the ungapped sequence. The MSF alignment
178+
parser uses the former which is counter-intuitive as programmers
179+
reading alignments would expect simple access to the ungapped
180+
sequence. There is no easy way to translate between them if you need
181+
the more advanced features such as coordinate translation from
182+
gapped to ungapped sequence. By allowing gap symbols directly in
183+
SimpleSymbolList, it is impossible programmatically to enforce
184+
whether a method accepts gapped or ungapped sequences.
174185

175186
Categories of Improvement
176187
-------------------------

_wikis/BioJava3_Proposal.mediawiki

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ It is suggested that development stop on the existing BioJava/BioJavaX/BioJava2
5959
#Equals, compareTo and hashCode methods are inconsistent and often inaccurate, e.g. customised to suit a certain behaviour pattern (e.g. the BJX extensions assume that nulls are allowable for the purposes of Hibernate, whereas really they shouldn't be and Hibernate doesn't need them either). Changing these would change the behaviour of the object model particularly when it comes to collections and maps.
6060
#Localisation causes mistranslation of strings from lower to upper case. For instance, in Turkish, the lower and upper case i/I do not match those in the English localisation. This causes protein sequences to be mistranslated or misrepresented. BioJava needs to be modified to take this into account.
6161
#BioSQL interaction is good but there are still issues - particularly to do with case conventions for naming things such as alphabets. A BioSQL mini-hackathon has been suggested as one way to nail down exactly how BioSQL should be used, right down to details like this, so that all projects may be able to fully interact without knowledge of which tool was used to write the data to BioSQL.
62+
#Gapped sequences and alignments need closer attention. Currently there are two ways - a SimpleSymbolList with '-' symbols, or a SimpleGappedSymbolList with proper block definitions and coordinate translation and access to the ungapped sequence. The MSF alignment parser uses the former which is counter-intuitive as programmers reading alignments would expect simple access to the ungapped sequence. There is no easy way to translate between them if you need the more advanced features such as coordinate translation from gapped to ungapped sequence. By allowing gap symbols directly in SimpleSymbolList, it is impossible programmatically to enforce whether a method accepts gapped or ungapped sequences.
6263
6364
==Categories of Improvement==
6465

0 commit comments

Comments
 (0)