Skip to content

fix(decoder): GzipDecoder fallback should decompress when headers lack gzip content type (AI-Triage PR)#895

Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1770315462-fix-gzip-decoder-fallback
Draft

fix(decoder): GzipDecoder fallback should decompress when headers lack gzip content type (AI-Triage PR)#895
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1770315462-fix-gzip-decoder-fallback

Conversation

@devin-ai-integration
Copy link
Contributor

fix(decoder): GzipDecoder fallback should decompress when headers lack gzip content type

Summary

One-line fix in create_gzip_decoder(): changes the fallback_parser from gzip_parser.inner_parser (e.g., CsvParser) to gzip_parser (the full GzipParser wrapping the inner parser).

Problem: When GzipDecoder is explicitly configured (e.g., as download_decoder in AsyncRetriever) and the HTTP response lacks Content-Encoding: gzip or Content-Type: application/gzip headers, the _select_parser() method falls back to the inner parser directly (e.g., CsvParser), skipping decompression entirely. This causes UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b because raw gzip bytes are fed to a text parser.

This is common for S3 pre-signed URL downloads (e.g., CircleCI Usage Export, Amazon Ads reports) where files are gzip-compressed but served as Content-Type: binary/octet-stream.

Fix: Since the user has explicitly declared GzipDecoder, the fallback should still decompress. Change fallback_parser=gzip_parser.inner_parserfallback_parser=gzip_parser.

Related issues: airbytehq/airbyte#56988, airbytehq/airbyte#66208, airbytehq/oncall#11173

Introduced in airbytehq/airbyte-python-cdk#378.

Review & Testing Checklist for Human

  • Verify behavior when response is NOT gzip-compressed and lacks gzip headers: With this change, GzipParser.parse() will be invoked even when there are no gzip headers. If any connector uses GzipDecoder in a context where some responses are genuinely non-gzipped AND lack gzip headers, GzipParser will now fail on those responses instead of gracefully parsing them. Check whether GzipParser.parse() needs a try/except fallback to inner_parser for non-gzip data, or whether this scenario is not possible when the user explicitly configures GzipDecoder.
  • Test with a real connector: Validate with CircleCI Usage Export or Amazon Ads connector that downloads gzip files from S3 pre-signed URLs with Content-Type: binary/octet-stream.
  • Review the original PR fix: (CDK) (AsyncRetriever) - Use the Nested Decoders to decode the streaming responses, instead of ResponseToFileExtractor #378 intent: The original implementation may have had a reason for the header-based fallback behavior. Confirm with @bazarnov or @maxi297 if needed.

Notes

…nner_parser

When GzipDecoder is explicitly configured and the HTTP response lacks
gzip-related headers (e.g., S3 pre-signed URLs returning
binary/octet-stream), the fallback_parser should still decompress
the data since the user has explicitly declared they expect gzip content.

Co-Authored-By: unknown <>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1770315462-fix-gzip-decoder-fallback#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1770315462-fix-gzip-decoder-fallback

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

PyTest Results (Fast)

3 856 tests  +1   3 844 ✅ +1   6m 11s ⏱️ -16s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit abed7f6. ± Comparison against base commit 15542de.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

PyTest Results (Full)

3 859 tests  +1   3 847 ✅ +1   11m 0s ⏱️ -1s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit abed7f6. ± Comparison against base commit 15542de.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants