Skip to content

Add SeqIO read/write support for NCBI 5-column tab-delimited feature tables#5032

Draft
michaelfm1211 wants to merge 5 commits intobiopython:masterfrom
michaelfm1211:feature-table
Draft

Add SeqIO read/write support for NCBI 5-column tab-delimited feature tables#5032
michaelfm1211 wants to merge 5 commits intobiopython:masterfrom
michaelfm1211:feature-table

Conversation

@michaelfm1211
Copy link
Contributor

  • I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • I have read the CONTRIBUTING.rst file, have run pre-commit
    locally, and understand that continuous integration checks will be used to
    confirm the Biopython unit tests and style checks pass with these changes.

  • I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

Closes #4943

This PR adds a SeqIO parser and writer for the 5-column tab-delimited feature table (.tbl) format used for preparing GenBank submissions.

@AgustinPardo's feedback on this may be helpful, as he opened the linked issue and wrote his own parser too.

@michaelfm1211 michaelfm1211 requested a review from peterjc as a code owner July 21, 2025 22:05
@codecov
Copy link

codecov bot commented Jul 21, 2025

Codecov Report

Attention: Patch coverage is 98.36957% with 3 lines in your changes missing coverage. Please review.

Project coverage is 86.19%. Comparing base (66ad0a0) to head (b8298a0).

Files with missing lines Patch % Lines
Bio/SeqIO/FeatureTableIO.py 98.36% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5032      +/-   ##
==========================================
+ Coverage   85.45%   86.19%   +0.74%     
==========================================
  Files         286      287       +1     
  Lines       59851    60035     +184     
==========================================
+ Hits        51144    51749     +605     
+ Misses       8707     8286     -421     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@michaelfm1211
Copy link
Contributor Author

Tests are still failing, but the error messages are a little less cryptic now. The .tbl reader works fine and tests seem to pass if the writer is commented out from Bio/SeqIO/__init__.py. The writer seems to be failing thetest_SeqIO.py tests which read from one format and writes in another and checks for consistency. There seems to be a lot of stuff that other features support but .tbl doesn't (for example, .tbl doesn't even explicitly store the sequence length, however, the length of the reference feature is usually the same as the length of the sequence). I'll convert this PR to a draft for now.

@michaelfm1211 michaelfm1211 marked this pull request as draft July 22, 2025 17:28
@peterjc
Copy link
Member

peterjc commented Jul 24, 2025

We should probably explicitly silence the warnings here:

test_SeqIO ... /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/Bio/SeqIO/FeatureTableIO.py:71: BiopythonParserWarning: Reference has Medline ID, but only PubMed IDs are officially supported in .tbl files: type: REFERENCE
location: [0:880](+)
qualifiers:
    Key: Medline, Value: ['92329728']

  warnings.warn(
/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/Bio/SeqIO/FeatureTableIO.py:71: BiopythonParserWarning: Reference has Medline ID, but only PubMed IDs are officially supported in .tbl files: type: REFERENCE
location: [0:513](+)
qualifiers:
    Key: Medline, Value: ['92119220']

  warnings.warn(
/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/Bio/SeqIO/FeatureTableIO.py:71: BiopythonParserWarning: Reference has Medline ID, but only PubMed IDs are officially supported in .tbl files: type: REFERENCE
location: [0:741](+)
qualifiers:
    Key: Medline, Value: ['94311827']

  warnings.warn(
/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/Bio/SeqIO/FeatureTableIO.py:71: BiopythonParserWarning: Reference has Medline ID, but only PubMed IDs are officially supported in .tbl files: type: REFERENCE
location: [0:182](+)
qualifiers:
    Key: Medline, Value: ['83265757']

  warnings.warn(
FAIL

In the new tests specific to this new parser we can test the expected warnings appear.

More generally, test_SeqIO.py has a bunch of special casing by format as differences are common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NCBI Sequin tbl format parser

2 participants