Skip to content

Conversation

@kabir
Copy link
Collaborator

@kabir kabir commented Oct 22, 2025

Add Kubernetes deployment example demonstrating multi-instance A2A agents with database persistence and Kafka-based event replication.

Fixes #281 and #373

Add Kubernetes deployment example demonstrating multi-instance A2A agents
with database persistence and Kafka-based event replication.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kabir, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a robust, end-to-end example for deploying an A2A agent in a cloud-native environment using Kubernetes. It illustrates key architectural patterns for scalable and resilient agent deployments, including load distribution, shared state management via a database, and real-time event synchronization across instances using Kafka. The example is fully runnable on Minikube and comes with detailed instructions and verification steps.

Highlights

  • New Cloud Deployment Example: Introduced a comprehensive example demonstrating how to deploy an A2A agent to Kubernetes, showcasing multi-instance agents, database persistence, and Kafka-based event replication.
  • Kubernetes Integration: Added all necessary Kubernetes manifests (namespace, PostgreSQL, Kafka, agent deployment, services) and helper scripts for easy deployment and management on Minikube.
  • Event Replication with Kafka: Implemented Kafka-based event replication to ensure that events generated by one agent pod are replicated and visible across all other agent instances, crucial for distributed systems.
  • Database Persistence: Configured PostgreSQL for persistent task storage, allowing task states to be shared and maintained consistently across multiple agent pods.
  • Multi-Pod Load Balancing: The example demonstrates round-robin load balancing across two agent pods, with a test client verifying that messages are processed by different instances.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/cloud-deployment-example.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cloud deployment example for the A2A agent using Kubernetes, PostgreSQL for task persistence, and Kafka for event replication. The changes include a README with setup instructions, Kubernetes manifests, deployment scripts, and a test client. The review focuses on identifying potential issues related to the deployment process, resource configurations, and overall clarity of the documentation.


### 1. Start Minikube

This example uses a local container registry on the host machine. Minikube must be configured to allow pulling from this insecure (HTTP) registry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a note about the importance of using the host network driver in Minikube for this example to work correctly, as the local registry is accessed via the host's IP.

Comment on lines +384 to +385
- Kafka not ready: Kafka takes 2-5 minutes to start fully
- Wait for `kubectl wait --for=condition=Ready kafka/a2a-kafka -n a2a-demo`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It might be helpful to include instructions on how to verify that topics are created successfully, potentially using kubectl exec into a Kafka pod and using the Kafka command-line tools.

MAX_RETRIES=3
for attempt in $(seq 1 $MAX_RETRIES); do
echo "Push attempt $attempt/$MAX_RETRIES..."
if $CONTAINER_TOOL push ${REGISTRY}/a2a-cloud-deployment:latest --tls-verify=false --retry=2 2>&1 | tee /tmp/push.log; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding --all-namespaces to the kubectl get commands to check for the Strimzi operator in all namespaces, in case the user has installed it in a different namespace than kafka.

MAX_RETRIES=3
for attempt in $(seq 1 $MAX_RETRIES); do
echo "Push attempt $attempt/$MAX_RETRIES..."
if $CONTAINER_TOOL push ${REGISTRY}/a2a-cloud-deployment:latest --tls-verify=false --retry=2 2>&1 | tee /tmp/push.log; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --tls-verify=false flag is used when pushing the image. This disables TLS verification, which is insecure and should be avoided in production environments. Since this is a development example using a self-signed certificate, it's acceptable, but a comment explaining why it's needed and the security implications would be helpful.

Consider adding a comment explaining the use of --tls-verify=false and the security implications.

MAX_RETRIES=3
for attempt in $(seq 1 $MAX_RETRIES); do
echo "Push attempt $attempt/$MAX_RETRIES..."
if $CONTAINER_TOOL push ${REGISTRY}/a2a-cloud-deployment:latest --tls-verify=false --retry=2 2>&1 | tee /tmp/push.log; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The push command includes --retry=2. This is good for handling transient network issues, but it might be beneficial to add a comment explaining why retries are needed and under what circumstances they might occur (e.g., flaky network in Minikube).

containers:
- name: a2a-agent
image: 192.168.49.1:5001/a2a-cloud-deployment:latest
imagePullPolicy: Always
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using imagePullPolicy: Always can cause unnecessary image pulls if the image tag is :latest. It might be better to use IfNotPresent for development to reduce image pull frequency, and Always for production with specific image tags.

Comment on lines +190 to +193
if (attempt > 1) {
System.out.println("Retry attempt " + attempt + "/" + maxRetries + "...");
Thread.sleep(1000); // Wait for Kafka events to propagate
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a small delay before retrying the subscription to allow Kafka events to propagate. This can help avoid transient errors when the Kafka cluster is still initializing.

Comment on lines +263 to +266
Client freshClient = Client.builder(streamingClient.getAgentCard())
.clientConfig(nonStreamingConfig)
.withTransport(JSONRPCTransport.class, new JSONRPCTransportConfigBuilder())
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new client for each request can be resource-intensive. Consider using a connection pool or reusing clients where possible to improve performance.

Comment on lines +330 to +331
if (!completionLatch.await(10, TimeUnit.SECONDS)) {
System.err.println("⚠ Timeout waiting for task completion");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The timeout for task completion is set to 10 seconds. In some environments, tasks might take longer to complete. Consider making this timeout configurable or increasing it to handle slower task processing.

<packaging>pom</packaging>

<name>Java SDK A2A Examples</name>
<name>Java SDK A2A Examples: Hello World</name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description is too generic. Consider updating the description to be specific to the Hello World example.

Suggested change
<name>Java SDK A2A Examples: Hello World</name>
<name>Java SDK A2A Examples: Hello World</name>
<description>A simple Hello World example for the Java SDK for the Agent2Agent Protocol (A2A)</description>

Replace external registry container approach with Minikube's built-in
registry addon for improved cross-platform compatibility.

Key changes:
- Use 'minikube addons enable registry' instead of external container
- Port-forward to localhost:5000 (macOS: socat, Linux: kubectl)
- Update image reference from 192.168.49.1:5001 to localhost:5000
- Change imagePullPolicy from Always to IfNotPresent
- Remove insecure-registry requirement from Minikube start
- Simplify GitHub Actions workflow

Benefits:
- Works with rootless Podman on Fedora/Linux
- No insecure registry configuration needed
- Cross-platform consistency (Mac, Linux, Windows)
- Follows proven WildFly cloud-tests patterns
- Simpler setup and troubleshooting

Fixes compatibility issues with rootless Podman where external registry
running in user context was inaccessible from Minikube VM.
@kabir
Copy link
Collaborator Author

kabir commented Oct 23, 2025

Replaced by #389

@kabir kabir closed this Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Experiment with deploying an A2A server agent on k8s to identify any issues/gaps with being able to scale agents

1 participant