Tags: stackbox-dev/tap-github
Tags
fix: Keep org-specific token rotation isolated (MeltanoLabs#555) ## Summary Keep token rotation scoped to the current organization when that organization has its own configured token pool. GitHub App installation tokens are org-scoped. If a stream is reading private repositories for one org and `get_next_auth_token()` falls through to another org's installation token, GitHub can return misleading `404 Not Found` responses for repositories that do exist and are accessible with the correct org token. ## What changed - `get_next_auth_token()` now prefers the current organization's token managers when `current_organization` has a configured token pool. - Fallback to org-agnostic or other-org tokens is still allowed when the current organization has no configured token pool. - Added regression tests for both paths: - org-specific token rotation stays inside the current org's pool. - missing org-specific pools still use the existing fallback behavior. ## Validation ```text uv run pytest tests/test_authenticator.py # 44 passed ``` --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Trish Gillett-Kawamoto <trish.gillett@shopify.com>
fix: Skip the `dependencies` stream with a warning if it has timed ou… …t and been retried (MeltanoLabs#554) SSIA Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
feat: tolerate 500 and 504 for diff streams (MeltanoLabs#546) Proposing to add a couple more commonly seen error codes to the tolerated list for diff streams. 500 = Server Error, 504 = Gateway Timeout. The diffs stream tends to be high cardinality (one call per diff), and the diffs endpoint has more failure modes than most (size limits, timeouts, server errors), making it more fragile than other streams. I think it makes sense to have it be very error tolerant and consider it a sort of 'best effort' stream.
feat(issues): Added `state_reason`, sub-issues fields (MeltanoLabs#543) The GitHub LIST endpoint returns several fields currently not in the tap schema. This PR adds them: --- `state_reason`: The reason for the state change. Ignored unless state is changed. [docs link](https://docs.github.com/en/rest/issues/issues?apiVersion=2026-03-10#list-repository-issues) Can be one of: `completed`, `not_planned`, `duplicate`, `reopened`, `null` --- `parent_issue_url`: URL to get the parent issue of this issue, if it is a sub-issue. String or null. Format: URI. NOT documented as a response to LIST endpoint but is documented in [sub-issue docs](https://docs.github.com/en/rest/issues/sub-issues). --- `sub_issues_summary`: object showing aggregated progress of an issue's sub-issues (GitHub docs copilot description). Fields: - total (integer) — total number of sub-issues - completed (integer) — number of completed sub-issues - percent_completed (integer) — percent complete (integer) Also not documented as a response to LIST endpoint but is documented in [sub-issue docs](https://docs.github.com/en/rest/issues/sub-issues). Co-authored-by: Adam Rubinstein <adamrubinstein@Adams-MacBook-Pro.local>
fix: Retry 404s and print the request ID in error logging (MeltanoLab… …s#535) We're seeing an increase in 404 errors that are not actually indicating existence or access problems - they are transient and will succeed other times with the same credential. It isn't specific to any one endpoint. I think we might need to start considering 404 errors to be a retriable error type. Also adding the request ID to the error logging, it's generally required when working with Github support on API issues so it's a good thing to always print in error cases. Example output (censored): `singer_sdk.exceptions.RetriableAPIError: 404 Client Error: b'{"message":"Not Found","documentation_url":"[https://docs.github.com/rest/issues/issues#list-repository-issues","status":"404"](https://docs.github.com/rest/issues/issues#list-repository-issues%22,%22status%22:%22404%22)}' (Reason: Not Found) for path: /repos/<org>/<repo>/issues [GitHub-Request-Id: <id>])` Co-authored-by: Edgar Ramírez Mondragón <16805946+edgarrmondragon@users.noreply.github.com>
fix(deps): Bump `requests` to 2.33 (MeltanoLabs#538) SSIA Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
fix(deps): Bump `PyJWT` to 2.12 (MeltanoLabs#531) Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
packaging: Migrate to uv! (MeltanoLabs#508) Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
fix: Tolerate 414 error in ProjectItemsStream (MeltanoLabs#500) I've encountered a case where the ProjectItemsStream fails with this error: ``` 414 Client Error: b'{"message": "We received a Request-URL that is too long from your client."} ``` Since ProjectItemsStream is constructing queries dynamically based on the project fields identified by the parent, it makes sense that it can construct a query so long the server rejects it. I'm proposing to skip and move on in this case.
feat: Add ability to authenticate multiple orgs in one stream (Meltan… …oLabs#488) # Overview This branch adds support for organization-specific authentication, enabling different GitHub App tokens to be used for different organizations within the same extractor, while maintaining backwards compatibility. ## New `org_auth_app_keys` field add that supports dictionary format for org-specific tokens: ``` org_auth_app_keys: org1: - $APP_KEY_1 org2: - $APP_KEY_2 ``` - Backwards compatible: Still supports `auth_app_keys` array format for org-agnostic tokens - Tokens are now stored and managed per-organization in `token_managers` dict - Fallback to using any org's token in the case of public repositories ### Reconciliation tests 1. Tested with providing a dictionary of organisation keys and a list of repositories, combining a mix of private and public repos: ``` org_auth_app_keys: shop: - $TAP_GITHUB_SHOP Shopify: - $TAP_GITHUB_SHOPIFY repositories: - Shopify/infrastructure - shop/world - Shopify/data-warehouse -- switch back to original org - matplotlib/matplotlib -- switch to an org that we haven't provided a key for ``` 2. Tested using both `org_auth_app_keys` and `auth_app_keys`: ``` auth_app_keys: - $TAP_GITHUB_SHOP org_auth_app_keys: Shopify: - $TAP_GITHUB_SHOPIFY repositories: - Shopify/infrastructure - shop/world - Shopify/data-warehouse - matplotlib/matplotlib ``` 4. Tested using the organization scope: ``` org_auth_app_keys: Shopify: - $TAP_GITHUB_SHOPIFY organizations: - Shopify ``` 5. Tested using the `GITHUB_APP_PRIVATE_KEY` environmental variable (no `auth_app_keys` provided): ``` config: organizations: - shop ``` 6. Tested using a child stream: - Parent: ``` org_auth_app_keys: Shopify: - $TAP_GITHUB_SHOPIFY shop: - $TAP_GITHUB_SHOP organizations: - Shopify - shop ``` Child: ``` - name: tap-github-shopify-repositories inherit_from: tap-github-shopify select: - repositories.* ``` --------- Co-authored-by: Trish Gillett <trish.gillett@shopify.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
PreviousNext