Skip to content

Conversation

@sbittrich
Copy link
Member

@sbittrich sbittrich commented Jan 27, 2021

This PR completely replaces the old mmCIF parsing functionality with the ciftools-based implementation.

Major/breaking changes:

  • removes old mmCIF parser and most redundant model beans (now ciftools provides beans for parsing and BioJava will access data via its high-level objects)
  • moves chem-comp handling from structure.io.mmcif to structure.chem
  • introduces StructureFiletype enum (BCIF, CIF, MMTF, PDB)
  • replaces AtomCache#setUseMmCif and AtomCache#setUseMmtf functions by one AtomCache#setFiletype
  • BinaryCIF is now the default data source

Bug fixes:

  • several fixes for ciftools-based CIF handling: e.g., Rfree/Rwork handling, entity parsing, NCS handling

Known bugs/problems:

  • SCOP parsing requires MMTF for some reason (the corresponding tests would also fail with the old mmCIF parser)
  • TestHardBioUnits#test4A1I fails when parsing BinaryCIF (I think it's some sort of numerical instability - I'll try to make sense of it)

Let me know if you'd like to see things done differently somewhere.

@josemduarte
Copy link
Contributor

This is fantastic, thank you! I'll need some time to go through it in detail.

Regarding the test TestHardBioUnits#test4A1I, I've had a look and it looks like a bug in SuperPositionQCP. If you switch to use SuperPositionSVD, then the tests works fine. I think that is a good solution for the problem here.

We can then create a separate issue for the SuperPositionQCP bug.

@sbittrich
Copy link
Member Author

Just played around with the QCP algorithm. The test will find the correct solution if I 'trim' the precision of the double values coming from BinaryCIF. Otherwise the result is completely off. Kinda strange.

Something along the lines of:

private Point3d[] prepare(Point3d[] point3ds) {
    return Stream.of(point3ds)
            .map(v -> new Point3d((int) (v.x * 1000) * 0.001,
                    (int) (v.y * 1000) * 0.001,
                    (int) (v.z * 1000) * 0.001))
            .toArray(Point3d[]::new);
}

Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great, big thank you!

I've added a few questions.

Also one missing thing is documenting the breaking changes. Could you add them to the CHANGELOG.md file so that when we release 6.0.0 it's already done?

@sbittrich
Copy link
Member Author

sbittrich commented Feb 2, 2021

Also one missing thing is documenting the breaking changes. Could you add them to the CHANGELOG.md file so that when we release 6.0.0 it's already done?

Done in 53982b0.

Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes. I was going to say that the remaining issue is with test TestHardBioUnits#test4A1I, but strangely enough, it works now. There's something definitely weird in there. That deserves a separate issue to investigate the instabilities in SuperpositionQCP

@sbittrich
Copy link
Member Author

Sounds good. I've opened an issue (#914).

# Conflicts:
#	biojava-structure/src/test/java/org/biojava/nbio/structure/asa/TestAsaCalc.java
@josemduarte
Copy link
Contributor

I'll merge this next week if there are no other comments.

@heuermh
Copy link
Member

heuermh commented Feb 20, 2021

I'll merge this next week if there are no other comments.

Would next week be a good time to discuss what might go in the version 6.0.0 release? There is a roadmap issue #879

@josemduarte
Copy link
Contributor

Would next week be a good time to discuss what might go in the version 6.0.0 release? There is a roadmap issue #879

Most items covered in #879 are done now. But if there's anything else that needs to be included, please add any ideas to the list. It is a good opportunity to clean up APIs.

@sbittrich
Copy link
Member Author

I've reverted the changes to QCP (that would change the threshold).
Instead TestHardBioUnits#test4A1I now uses SVD and there's a dedicated test for the issue in TestSuperPositionQCP as discussed in #914.

@josemduarte josemduarte mentioned this pull request Feb 26, 2021
@josemduarte josemduarte merged commit 162a97c into biojava:master Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants