Skip to content

Pass a header into calls to mixer for usage logs#276

Merged
lucyking140 merged 27 commits intodatacommonsorg:masterfrom
lucyking140:lucysking/clientlib-metadata
Oct 2, 2025
Merged

Pass a header into calls to mixer for usage logs#276
lucyking140 merged 27 commits intodatacommonsorg:masterfrom
lucyking140:lucysking/clientlib-metadata

Conversation

@lucyking140
Copy link
Contributor

@lucyking140 lucyking140 commented Sep 20, 2025

  • Adds a clientlib-python header to the newer datacommons_client client library
    • This does not include the older datacommons client because it makes calls to the V1 API, which will be deprecated soon.
  • These are used in Mixer's usage logs
  • The header accepts an optional surface_header_value tag in the observation endpoints so that other DC surfaces that make calls to the client library (MCP server, DataGemma, etc.) can be tracked.
    • Theoretically, others could also pass in other values for this tag if they make calls directly to the python client library. Like with custom DCs, I plan to accept that this might happen in rare cases because these usage logs will be aggregated over many queries.

Tests:

  • Checked with local mixer that the metadata is correctly updated in the clientlib-python case and when we pass in metadata_source
  • Modified current test suite to include this header in the expected response.

@lucyking140 lucyking140 marked this pull request as ready for review September 22, 2025 15:30
@hqpho
Copy link
Collaborator

hqpho commented Sep 22, 2025

/gcbrun

@hqpho hqpho self-requested a review September 22, 2025 15:34
Copy link
Collaborator

@hqpho hqpho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, LG! Nitty naming thing: since both "metadata" and "source" are either already or likely to soon become overloaded terms in the DC ecosystem, WDYT of calling the parameter something like surface_header_value or feature_surface_tag or something like that?

# if this call originates from another DC product (MCP server, DataGemma, etc.), we indicate that to Mixer
if(metadata_source):
# makes it clearer to public users that this is a tag specific to other DataCommons features
if(metadata_source != "mcp" and metadata_source != "datagemma"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Define a constant array of allowed values somewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a list! Also expanded this to include "mcp-{some number}" to support MCP versions like Christie suggested. Also renamed the param to surface_header_value!

@hqpho
Copy link
Collaborator

hqpho commented Sep 22, 2025

/gcbrun

@hqpho
Copy link
Collaborator

hqpho commented Sep 22, 2025

/gcbrun

Copy link
Collaborator

@hqpho hqpho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will require a PyPi release to take effect. Let's coordinate that after it's merged.

Raised when a surface_header_value is passed in that is not a valid Data Commons surface.
This value is used in the DC usage logs in Mixer
"""
default_message = "The surface header value should only to indicate a call made from Data Commons surfaces like the MCP server or DataGemma."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible edit: "The surface header value must be a surface known to the Data Commons team."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


url = (self.base_url if endpoint is None else f"{self.base_url}/{endpoint}")
headers = self.headers
# if this call originates from another DC product (MCP server, DataGemma, etc.), we indicate that to Mixer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: mention the default here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added! And moved this to the new location where I'm setting the header

dc_instance = "datacommons.org"

self.headers = build_headers(api_key)
self.api_key = api_key if api_key else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is self.api_key referenced anywhere else? If not, consider passing it as an explicit additional param to build_headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to a param!

@patch("requests.post")
def test_send_post_request_surface_header(mock_post):
"""Tests a successful POST request with a surface header."""
mock_response = MagicMock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit and usually would be auto-formatted in other repos: I think there needs to be a blank line after the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@lucyking140 lucyking140 merged commit bd578ff into datacommonsorg:master Oct 2, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants