Skip to content

Conversation

@jaleman-vdr-wikimedia
Copy link
Contributor

  • Updated the file to instead fetch latest versions of packages

  • Refactored api_client to use Abstract base classes

  • Re-wrote tests (integration and regular suite) to accommodate said refactor

  • Re-wrote examples to accommodate refactor, and showcase new method of working with SDK

  • Fixed a few KeyErrors on a couple data models that the refactor brought to light

Below, is the README portion for the new changes implemented on the refactor:

Key Features

  • Automatic retries and built-in rate limiting.
  • Base Class (ABC)
  • Stoppable Callbacks

Automatic Retries

The client handles network request failures automatically using a built-in httpx transport.

How it works:

When you initialize the Client, you can specify the max_retries parameter.

  • By default, this is set to 3 retries.

  • The client will automatically retry requests that fail with a 5xx server error or a network connection error.

Usage:

To change the number of retries, just pass the parameter during initialization.

# Initialize the client to retry up to 5 times on network errors
client = Client(
    max_retries=5
)

# This call will be automatically retried 5 times if it fails
projects = client.get_projects(Request())

Built-in Rate Limiting

To prevent 429 Too Many Request errors, the client includes a configurable, client-side rate limiter. This ensures you don't send requests faster than the API allows.

How it works:

You can specify the rate_limit_per_second parameter.

  • If you set this, the client will automatically time.sleep() between requests to ensure the average number of requests per second does not exceed this limit.

Usage:

To limit the client to a maximum of 10 requests per second:

# Initialize the client with a 10 requests/second rate limit
client = Client(
    rate_limit_per_second=10.0
)

# The client will automatically add small delays as needed
# when you make many requests in a loop.
for snapshot in all_snapshots:
    client.download_snapshot(snapshot.identifier, writer)
    # A small sleep() will be added here if needed
    # to stay under the 10 req/sec limit.

The API Abstract Base Class

The Client class implements a formal interface defined in the API abstract class. This "contract" defines all the public methods of the client (e.g., get_projects, download_snapshot, etc.).

The benefit of this is type-hinting, allowing you to code against the API interface:

en_filter = Filter(field="in_language.identifier", value="en")
        req_filtered = Request(filters=[en_filter])
        en_batches = api_client.get_batches(batch_time, req_filtered)

Stoppable Callbacks

For all streaming methods (stream_articles) and large data file methods (read_snapshot, read_batch, read_all), we can control the processing loop.

How it works

All callback functions you provide are expected to return a boolean value.

  • Return True to continue processing the stream or file.

  • Return False to stop processing immediately.

The client will check the return value of your callback after every single item (every article in a stream, every line in a file). As soon as it receives False, it will break the loop and close the stream.

Usage

This is useful to to limit processing to a specific number of items.

Example: Obtains 5 articles and then stops

Python

def  stream_callback(article: Article) -> bool:
	try:

		article_name = article.name  or  article.identifier  or  'Unknown'

		event_id = 'unknown_event'
		if  article.event:
				event_id = article.event.identifier  or  'unknown_event'
				
		logger.info(
			"[%s] Received article (event: %s): %s",
			len(articles_received_tracker) + 1,
			event_id,
			article_name
		)

		articles_received_tracker.append(article)

		if  len(articles_received_tracker) >= STOP_AFTER_N_ARTICLES:
			 logger.warning(
				"Reached stop limit of %s articles. 
				 Returning False to stop stream.", 
				 STOP_AFTER_N_ARTICLES
			 )
			 return  False

	except  Exception  as  e:
		logger.error("Error within callback function: %s", e)
		return  False
	
	return  True

Updated the file to instead fetch latest versions of packages
- Refactored api_client to use Abstract base classes

- Re-wrote tests (integration and regular suite) to accommodate said refactor

- Re-wrote examples to accommodate refactor, and showcase new method of working with SDK

- Fixed a few KeyErrors on a couple data models that the refactor brought to light
@jaleman-vdr-wikimedia jaleman-vdr-wikimedia self-assigned this Nov 6, 2025
@jaleman-vdr-wikimedia jaleman-vdr-wikimedia added the enhancement New feature or request label Nov 6, 2025
This commit contains small clean ups done to reflect feedback received on currently open PR.
json_data = json.load(file)
year, electoral_vote, running_mate, nominee = extract_json_data(json_data)

# electoral_vote is a set of integers in a string separated by spaces, parse it into a list of integers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: I'm not opposed to explanatory comments in the code, my comment was about one in particular that seemed only useful in the context of this review, namely # This now returns a List[StructuredContent]

@jaleman-vdr-wikimedia jaleman-vdr-wikimedia merged commit 62f5a25 into wikimedia-enterprise:main Nov 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants