You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _wikis/BioJava3_Proposal.mediawiki
+2-16Lines changed: 2 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,44 +14,30 @@ It is suggested that development stop on the existing BioJava/BioJavaX/BioJava2
14
14
* The only database support is for BioSQL, which uses Hibernate but not in a fully flexible manner (i.e. cannot connect to more than one db at a time).
15
15
* It is sequence-focused. Users have moved on.
16
16
17
-
18
17
==Proposal==
19
18
20
19
* Analyse how BioJava is being used by the community. See the [[UsageAnalysis]] page.
21
-
22
20
* To start from scratch, creating a number of smaller jars as sub-projects within an umbrella BioJava3 project. Each jar would provide tools for a specific purpose. Additional jars would provide cross-purpose tools such as format converters or text-to-object interfaces.
23
-
24
21
* Although starting from scratch, much existing code could be reused or refactored to suit the new design.
25
-
26
22
* We would take full advantage of Java 6, including generics, (@)annotations, the built-in property change support. Everything would be a bean - absolutely everything.
27
-
28
23
* We would aim to be fully Java EE compliant, with the majority of components fully reusable as a bean in any other application, just like Spring's components are.
29
-
30
24
* We would write a JUnit test for every single class, writing the test first then the class afterwards. If other test frameworks are out there we could investigate these too - one suggestion is [http://testng.org/doc/ TestNG]. We would also write documentation for every single class with additional full documentation for each separate jar.
31
-
32
25
* We would adhere rigidly to a common coding style and heavily comment the code.
33
-
34
26
* We should make it able to focus on any aspect the user requires and keep its efficiency, removing its dependency on everything being sequence-related.
35
27
36
28
* SymbolLists and Alphabets to be rethought as these are the most common stumbling block.
37
29
38
30
==Data structure==
39
31
40
32
* RecordSource is an object which provides data. It can represent a file, a directory of files, a database, a web search engine, etc. etc. etc.. It has a RecordFormat which reads/writes Records to/from the RecordSource. It provides an iterator over Records which match a given RecordSearch.
41
-
42
33
* A RecordFormat is version-specific to the format, as are the Record objects it produces.
43
-
44
34
* RecordSearch defines search criteria to be applied to a RecordSource (or group thereof). It provides an iterator which returns all the combined Records from all RecordSources the RecordSearch was applied to. It uses RDF or something similar to map fields between different kinds of Records and the search parameters.
45
-
46
35
* Record is a piece of data in any format, as a bean. It should be as lightweight as possible - lazyloading of all non-key data would be ideal. Each different kind of Record has an object structure suitably matched to the RecordFormat that produced it - e.g. Genbank Record objects should be structured internally in almost exactly the same way as the Genbank file. This allows minimal loss of information and maximum flexibility.
47
-
48
36
* RecordConverters convert Record objects between different formats, e.g. Genbank Record to FASTA Record. They allow sensible defaults to be provided where one format does not supply enough info to satisfy the minimum requirements of another. Some kind of bean conversion system based on RDF would be suitable for this.
49
-
50
-
* A set of tools for converting flat data (e.g. sequence strings, taxononmy strings) into BioJava-like objects (e.g. SymbolLists, NCBITaxon). These BioJava-like objects could then be used for more advanced applications.
51
-
37
+
* A set of tools for converting flat data (e.g. sequence strings, taxononmy strings) into BioJava-like objects (e.g. SymbolLists, NCBITaxon). These BioJava-like objects could then be used for more advanced applications. One possible candidate would be [http://dozer.sourceforge.net/ Dozer].
52
38
* A set of tools for manipulating the BioJava-like objects.
53
39
54
40
==Action plan==
55
41
56
-
# Please modify this page as you see fit in order to flesh out details and/or make new points.
42
+
# Please modify this page and the [[Talk:BioJava3_Proposal|Talk page]] as you see fit in order to flesh out details and/or make new points.
57
43
# Tentative Singapore meeting to get the ball rolling on the final design and initial coding front.
0 commit comments