Skip to content

Added PERMISSIVE argument to MMCIFParser as in PDBParser.#5134

Open
piecole wants to merge 2 commits intobiopython:masterfrom
piecole:MMCFParser_PERMISSIVE
Open

Added PERMISSIVE argument to MMCIFParser as in PDBParser.#5134
piecole wants to merge 2 commits intobiopython:masterfrom
piecole:MMCFParser_PERMISSIVE

Conversation

@piecole
Copy link

@piecole piecole commented Jan 6, 2026

  • I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • I have read the CONTRIBUTING.rst file, have run pre-commit
    locally, and understand that continuous integration checks will be used to
    confirm the Biopython unit tests and style checks pass with these changes.

  • I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

Added PERMISSIVE argument to MMCIFParser as this is already present (and default True) in PDBParser. To allow problematic MMCIF files that include duplicate atoms in a residue. Kept PERMISSIVE as default False in MMCIFParser to keep behaviour consistent with previous versions unless specified.

Also added _handle_PDB_exception similarly to PDBParser to handle this by wrapping the already imported PDBConstructionException and PDBConstructionWarning.

Previously could not open .cif version of structure 9JXF due to the below error, but can now with PERMISSIVE = True.
"Bio.PDB.PDBExceptions.PDBConstructionException: Atom N defined twice in residue <Residue GLN het= resseq=21 icode= >"

Did not make this argument as permissive as it is in PDBParser, currently just allowing duplicate atoms, but happy to if there are example .cif structures that need it to be more permissive.

…oblems in PDB file with e.g. duplicate atoms in a residue. Kept PERMISSIVE as default False to keep behaviour consistent with previous versions.
@piecole piecole requested a review from JoaoRodrigues as a code owner January 6, 2026 16:40
@mdehoon
Copy link
Contributor

mdehoon commented Jan 7, 2026

I am not the main Bio.PDB person, but anyway I would suggest to use lowercase for argument names to be consistent with Python conventions.

@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.38%. Comparing base (54a8eda) to head (f6e835b).

Files with missing lines Patch % Lines
Bio/PDB/MMCIFParser.py 40.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5134      +/-   ##
==========================================
+ Coverage   85.52%   86.38%   +0.86%     
==========================================
  Files         282      282              
  Lines       59497    59506       +9     
==========================================
+ Hits        50882    51404     +522     
+ Misses       8615     8102     -513     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@piecole
Copy link
Author

piecole commented Jan 7, 2026

Error still happens with clean Biopython install. Likely due to new NumPy version since this does not occur with NumPy <2.4, although there are relevant depreciation warnings. As far as I understand the error is unrelated to my PR.

Opened an issue at #5135.

NumPy 2.4:

====================================================== FAILURES =======================================================
____________________________________________ VectorTests.test_coord_space _____________________________________________
TypeError: only 0-dimensional arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

self = <test_PDB_vectors.VectorTests testMethod=test_coord_space>

    def test_coord_space(self):
        """Confirm can generate coordinate space transform for 3 points."""
        # start with 3 points already aligned to axes
        point_set = (
            np.array([[2.0], [0.0], [2.0], [1.0]]),
            np.array([[0.0], [0.0], [0.0], [1.0]]),
            np.array([[0.0], [0.0], [2.0], [1.0]]),
        )
        # confirm get id matrix to transform to/from coord space
        homog_id = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
>       mtxs = coord_space(point_set[0], point_set[1], point_set[2], True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Tests\test_PDB_vectors.py:208:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Bio\PDB\vectors.py:559: in coord_space
    set_homog_trans_mtx(-a1[0], -a1[1], -a1[2], tm)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

x = array([-0.]), y = array([-0.]), z = array([-0.])
mtx = array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

    def set_homog_trans_mtx(x: float, y: float, z: float, mtx: np.ndarray):
        """Update existing translation matrix to new values."""
>       mtx[0][3] = x
        ^^^^^^^^^
E       ValueError: setting an array element with a sequence.

Bio\PDB\vectors.py:475: ValueError
=============================================== short test summary info ===============================================
FAILED Tests/test_PDB_vectors.py::VectorTests::test_coord_space - ValueError: setting an array element with a sequence.
================================================== 1 failed in 1.86s ==================================================

Numpy 1.26.4 (also works in 2.3.5):

=========================================== 1 passed, 126 warnings in 3.41s ===========================================

@piecole
Copy link
Author

piecole commented Jan 7, 2026

I am not the main Bio.PDB person, but anyway I would suggest to use lowercase for argument names to be consistent with Python conventions.

Included as uppercase PERMISSIVE to be consistent with the PERMISSIVE arg in Bio.PDB.PDBParser. Can change if requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants