Skip to content

Add an EMBL file parser to BioJava #621#713

Merged
sbliven merged 12 commits intobiojava:masterfrom
NoorAldeenMB:emblParser
Feb 20, 2018
Merged

Add an EMBL file parser to BioJava #621#713
sbliven merged 12 commits intobiojava:masterfrom
NoorAldeenMB:emblParser

Conversation

@NoorAldeenMB
Copy link
Copy Markdown
Contributor

@NoorAldeenMB NoorAldeenMB commented Nov 21, 2017

This is an implementation for first version for embl parser feature with issue #621 Add an EMBL file parser to BioJava.

Copy link
Copy Markdown
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request!
It looks to me like the EmblParser class shouldn't be called like that but rather EmblRecord or something similar, since it is the top level container for the parsed information. I'd also rather have the parse method as a static taking the file as argument and returning the EmblRecord. Javadocs explaining how the parser should be used would be nice.

public class EmblId {


private String PrimaryAccession;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, variable names in java should start with lower case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for that.

this.file = file;
}

public void parse() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should throw an IOException instead of catching it. Otherwise the user would not be able to handle it properly in their own application.


@Test
public void test(){
File file = new File("/home/pslpt219/Desktop/Homo.dat");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for any other user. You will need to add the file to the main/resources folder and read it from there. See how it is done elsewhere in biojava. If the file is very big, please gzip it before adding it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the comment i have moved the file into the src/test/resources.

}

@Test
public void test(){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to assert something in the test, e.g. whether fields are not null and checking that values read for the fields are as expected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for that i didn't write full tests for the class but i'm working to write cases to cover more percentage of the code.


/**
* This class should parse the data of embl file identification
* @author Noor Aldeen Al Mbaidin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add @since 5.0.0 tag to all the new classes here? Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@NoorAldeenMB
Copy link
Copy Markdown
Contributor Author

@josemduarte
thank you for the review.
I have changed the name of class from EmblParser to EmblReader that return an object of EmblRecord.

Copy link
Copy Markdown
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the changes, they look good! Just one more small issue with the test. We can merge then unless someone else wants to comment something


@Test(expected = IllegalArgumentException.class)
public void givenDirectoryWhenProcessEmblFileThenThrowException() throws IOException {
File file = new File("./src/test/resources");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite right. See for instance how it is done here:

InputStream inStream = new GZIPInputStream(this.getClass().getResourceAsStream("/org/biojava/nbio/structure/io/1b8g_raw.pdb.gz"));

The important thing is to use getResourceAsStream which will work in all situations (in jar packaging for instance).

private String databaseCrossReference;
private String assemblyHeader;
private String assemblyInformation;
private String CON;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be lower case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed as constructedSequence.

Copy link
Copy Markdown
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

*/
public class EmblReader {

private StringBuilder sequence = new StringBuilder("");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be removed, it doesn't seem to get used

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


import java.io.*;
import java.util.LinkedList;
import java.util.List;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

heuermh
heuermh previously requested changes Dec 4, 2017
@@ -0,0 +1,108 @@
package org.biojava.nbio.core.sequence.io.embl;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All source code in biojava requires the license header

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

private String taxonomicDivision;
private String sequenceLength;

public EmblId() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no-arg public constructor can be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

* @since 5.0.0
* @author Noor Aldeen Al Mbaidin
*/
public class EmblId {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the fields were marked final and only settable in the constructor, this class could be marked @Immutable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -0,0 +1,147 @@
package org.biojava.nbio.core.sequence.io.embl;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All source code in biojava requires the license header

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


private StringBuilder sequence = new StringBuilder("");

public EmblReader() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no-arg public constructor can be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

String lineInfo;
try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
while ((line = bufferedReader.readLine()) != null) {
lineInfo = line.substring(0, 2);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between lineInfo and lineIdentifier here? Note also that there is no check if line is long enough for this substring call not to fail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this was a mistake , I have added validation for line length.

@@ -0,0 +1,291 @@
package org.biojava.nbio.core.sequence.io.embl;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All source code in biojava requires the license header

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -0,0 +1,134 @@
package org.biojava.nbio.core.sequence.io.embl;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All source code in biojava requires the license header

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -0,0 +1,47 @@
package org.biojava.nbio.core.sequence.io.embl;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All source code in biojava requires the license header

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@josemduarte
Copy link
Copy Markdown
Contributor

@heuermh could you approve the latest changes so that we can merge?

@NoorAldeenMB
Copy link
Copy Markdown
Contributor Author

@heuermh may i ask you to review the changes ?

@paolopavan
Copy link
Copy Markdown
Contributor

paolopavan commented Dec 20, 2017 via email

@NoorAldeenMB
Copy link
Copy Markdown
Contributor Author

Dear @paolopavan
thank you for your opinion, I will try to use the AbstractSequence class and upload the changes as soon as possible.

@paolopavan
Copy link
Copy Markdown
Contributor

paolopavan commented Dec 27, 2017 via email

@sbliven
Copy link
Copy Markdown
Member

sbliven commented Feb 20, 2018

I'm going to go ahead and merge this since @heuermh's requested changes have been satisfied.

@paolopavan's suggestion to return sequence objects is a good idea. We've seen that this can be a bit tricky to do in a type-safe manner (#354), so I think it's worth submitting that as a second PR.

@sbliven sbliven dismissed heuermh’s stale review February 20, 2018 14:03

License headers have been added. All requested changes are satisfied.

@sbliven sbliven merged commit 615c03d into biojava:master Feb 20, 2018
@heuermh
Copy link
Copy Markdown
Member

heuermh commented Feb 20, 2018

@sbliven I have further concerns but as you say those could be addressed in a future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants