Skip to content

Conversation

@aalhossary
Copy link
Member

They are parsed from PDB and added to Structure.
Not parsed from mmCIF yet.
Fixes #946

They are parsed from PDB and added to Structure.
Not parsed from mmCIF yet.
Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aalhossary . Since mmCIF now is the main format, I think it'd be great if you could include filling the new keywords field from mmCIF and BCIF. That should be actually very straight forward. The CIF parsing work is done already, so it is only a matter of calling the setKeywords() with the value extracted from the appropriate mmCIF field. Could you have a look at that?

@aalhossary
Copy link
Member Author

Added setting keywords from _struct_keywords.text and fixed a wrong use of _struct_keywords.pdbx_keywords.

Copy link
Member

@sbittrich sbittrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements @aalhossary, below are some comments on the CIF parsing part.

@aalhossary
Copy link
Member Author

aalhossary commented Aug 4, 2021

Thank you @josemduarte and @JonStargaryen, and sorry for the inconvenience because I am totally new to CIF parsing.
On the other hand, shall we create another new issue to remove the description field altogether from the PDB Header?

Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aalhossary . Actually, now that I read the discussion about the PDBHeader.getDescription() field. It looks like keywords would be better placed under PDBHeader. Also it seems that description was used to hold the keywords data formerly, right? How about deprecating it, keeping the data in both fields (keywords and description)?

@aalhossary
Copy link
Member Author

The Header as defined in the PDB file format has three specific fields (not including keywords):

  • classification
  • depDate
  • idCode

Keywords may include:

  • Functional classification.
  • Metabolic role.
  • Known biological or chemical activity.
  • Structural classification.
  • Other classifying terms

It would be misleading for a user to find the KEYWODS field in the header where it is not defined. Therefore I believe it's better to keep it outside the header.

===============
Yes, it seems that the description was used to store the keywords (from the mmCIF files only) formerly.

Therefore, I agree with you that the best option is to deprecate description, because:

  • keywords field does not only include description
  • A description by convention is a free text not a list of keywords.

@aalhossary
Copy link
Member Author

@josemduarte I noticed that the description field is used in places like SubstructureIdentifier.reduce().
Shall we still ignore them and deprecate it, or leave it as a descriptive field that is not mapped to anything in the PDB/mmCIF files?

@josemduarte
Copy link
Contributor

@aalhossary the PDBHeader class in BioJava does not intend to replicate the HEADER annotation in the legacy PDB format. Instead it is a way to pack into a single object annotations and extra data that are only needed in some use-cases, but not in others. For instance, if someone cares about coordinates only for a certain application, they could completely drop the PDBHeader by setting it to null. My point of storing keywords under PDBHeader is to avoid having many extra fields at the Structure level, that might not be used in many applications.

Updated the documentation
Deprecated description
@aalhossary
Copy link
Member Author

Although BioJava travis-ci does not run currently, the latest commit of this PR was successfully run on my travis https://app.travis-ci.com/github/aalhossary/biojava (except for the deployment part of course).

Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @aalhossary and sorry for the delay. See a minor issue about the deprecated tag. After that it should be ok to merge

Small typographical error correction as well.
Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@josemduarte
Copy link
Contributor

The CI problems are still not solved. I'll merge this after testing it locally.

@josemduarte josemduarte merged commit 0d9650a into biojava:master Sep 20, 2021
@aalhossary aalhossary deleted the Fix_946_parse_PDB_record_KEYWDS branch September 21, 2021 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parse PDB record KEYWDS

3 participants