fix usersByCountry query #1003

BilalG1 · 2025-11-06T19:02:36Z

Summary by CodeRabbit

Improvements
- Optimized country-level metrics calculation with intelligent sampling and automatic scaling to project accurate user counts from sample data.
- Enhanced data aggregation logic with improved handling of edge cases and invalid entries.

vercel · 2025-11-06T19:02:41Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
stack-backend	Ready	Preview	Comment	Nov 6, 2025 7:11pm
stack-dashboard	Ready	Preview	Comment	Nov 6, 2025 7:11pm
stack-demo	Ready	Preview	Comment	Nov 6, 2025 7:11pm
stack-docs	Ready	Preview	Comment	Nov 6, 2025 7:11pm

coderabbitai · 2025-11-06T19:02:49Z

Walkthrough

A sampling cap is introduced for country metric calculations, along with a scaling factor to estimate accurate country user counts when the total exceeds the sample size. Country aggregation logic is reworked to handle null entries and filter invalid results accordingly.

Changes

Cohort / File(s)	Change Summary
Country Metrics Sampling `apps/backend/src/app/api/latest/internal/metrics/route.tsx`	Adds `MAX_USERS_FOR_COUNTRY_SAMPLE` cap for user sampling; pre-counts total users and applies scaling factor to project accurate country aggregates; reworks result construction to handle null countryCode entries, compute `estimatedCount` by scaling raw counts, and filter out invalid rows.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Sampling and scaling logic correctness requires careful verification of the calculation, particularly the scalingFactor derivation and its application
Edge case handling for null countryCode entries and filtering logic should be validated
Statistical accuracy of the estimation approach warrants review against intended behavior

Possibly related PRs

fix metrics queries #880: Modifies the same loadUsersByCountry function and the metrics route file; introduces changes to isAnonymous handling in the same query context.

Poem

🐰 A count capped and scaled with care,
Sampling fairly, statistics fair,
Null entries filtered away,
Metrics brighten up the day!
Rabbits hop through data's maze,
Estimates shine through the haze. ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description only contains the template comment referencing CONTRIBUTING.md guidelines with no actual content, context, or explanation of the changes.	Add a detailed description explaining what bug was fixed, why the sampling and scaling approach was needed, and any relevant implementation details or testing performed.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix usersByCountry query' is directly related to the main changes, which refactor the usersByCountry metric calculation with sampling and estimation logic.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-metrics-route-error

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

Greptile Overview

Greptile Summary

Optimized loadUsersByCountry to handle large user bases by sampling max 10,000 users and scaling results proportionally.

Key changes:

Added MAX_USERS_FOR_COUNTRY_SAMPLE constant (10,000 users)
Query now counts total users first, then samples ordered subset
Introduced scaling factor calculation to estimate full population metrics from sample
Improved null filtering with TypeScript type guards for safer array operations

Impact:
This prevents performance issues when querying country data for projects with large user bases by limiting database queries while maintaining reasonable accuracy through statistical sampling.

Confidence Score: 4/5

This PR is safe to merge with minimal risk - straightforward performance optimization with sound sampling approach
The changes implement a well-structured performance optimization using statistical sampling. The logic is sound: limiting queries to 10K users with deterministic ordering and scaling results proportionally. However, confidence is 4/5 rather than 5/5 because sampling introduces statistical estimation which may be less accurate for certain data distributions, particularly when country distribution is non-uniform across user IDs
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
apps/backend/src/app/api/latest/internal/metrics/route.tsx	4/5	Added performance optimization to sample max 10K users for country metrics with scaling estimation, improved null filtering logic

Sequence Diagram

sequenceDiagram
    participant Client
    participant RouteHandler
    participant loadUsersByCountry
    participant Prisma
    participant GlobalPrisma

    Client->>RouteHandler: GET /api/latest/internal/metrics
    RouteHandler->>loadUsersByCountry: Request country data (cached)
    loadUsersByCountry->>Prisma: Count total users
    Prisma-->>loadUsersByCountry: totalUsers count
    loadUsersByCountry->>Prisma: Find sample users (max 10K, ordered)
    Prisma-->>loadUsersByCountry: Sample user IDs
    loadUsersByCountry->>loadUsersByCountry: Calculate scaling factor
    loadUsersByCountry->>GlobalPrisma: Query country data for sample
    GlobalPrisma-->>loadUsersByCountry: Raw country counts
    loadUsersByCountry->>loadUsersByCountry: Apply scaling factor to estimates
    loadUsersByCountry-->>RouteHandler: Scaled country statistics
    RouteHandler-->>Client: Complete metrics response

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

apps/backend/src/app/api/latest/internal/metrics/route.tsx (2)
36-41: Extra query added for scaling calculation.

The totalUsers count query is necessary to calculate the scaling factor. While this adds an extra database round-trip, the performance impact should be minimal, and the approach is clear and maintainable.

If optimization becomes necessary, consider combining the count with the fetch using a window function or CTE, though the current approach prioritizes clarity.

48-49: Document the deterministic sampling trade-off.

The orderBy: { projectUserId: "asc" } approach ensures consistent sampling (beneficial for caching), but introduces bias by always selecting the earliest users. If geographic distribution has shifted over time, this could underestimate or miss countries where users joined more recently.

Consider one of these approaches:

Add documentation explaining the trade-off between cache consistency and sampling accuracy.

Use hash-based sampling for deterministic yet more representative results:
// Example: Sample ~10% deterministically across all user IDs
const samplingPercentage = Math.min(100, Math.ceil((MAX_USERS_FOR_COUNTRY_SAMPLE / totalUsers) * 100));
const users = await prisma.$queryRaw`
  SELECT "projectUserId"
  FROM "ProjectUser"
  WHERE "tenancyId" = ${tenancy.id}::UUID
    AND ${includeAnonymous} OR "isAnonymous" = false
    AND MOD(ABS(HASHTEXT("projectUserId"::text)), 100) < ${samplingPercentage}
  LIMIT ${MAX_USERS_FOR_COUNTRY_SAMPLE}
`;
This maintains determinism while avoiding chronological bias.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c816a0 and f0e886a.

📒 Files selected for processing (1)

apps/backend/src/app/api/latest/internal/metrics/route.tsx (4 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: check_prisma_migrations (22.x)
GitHub Check: lint_and_build (latest)
GitHub Check: all-good
GitHub Check: build (22.x)
GitHub Check: build (22.x)
GitHub Check: build (22.x)
GitHub Check: docker
GitHub Check: restart-dev-and-test
GitHub Check: setup-tests
GitHub Check: restart-dev-and-test-with-custom-base-port

🔇 Additional comments (3)

apps/backend/src/app/api/latest/internal/metrics/route.tsx (3)

17-17: LGTM: Sampling cap defined.

The constant is well-named and the 10,000 sample size is reasonable for preventing performance issues while maintaining adequate statistical representation.

58-58: LGTM: Scaling factor calculation is correct.

The logic correctly scales up country counts when sampling is used, and defaults to 1 (no scaling) when the full dataset is processed.

83-91: LGTM: Result construction correctly handles null values and scaling.

The code properly:

Guards against null countryCode (defensive programming despite SQL WHERE clause)

Applies the scaling factor with appropriate rounding

Uses a type guard to filter null entries

Note on sampling limitations:
Countries with very few users may not appear in the sample at all and will be completely absent from results. This is an inherent limitation of the sampling approach. Consider adding a comment or documentation noting that small countries may be underrepresented or missing when sampling is active.

fix usersByCountry query

4548820

github-actions bot assigned BilalG1 Nov 6, 2025

Merge branch 'dev' into fix-metrics-route-error

f0e886a

BilalG1 requested a review from N2D4 November 6, 2025 19:03

BilalG1 assigned N2D4 and unassigned BilalG1 Nov 6, 2025

N2D4 approved these changes Nov 6, 2025

View reviewed changes

github-actions bot assigned BilalG1 and unassigned N2D4 Nov 6, 2025

greptile-apps bot reviewed Nov 6, 2025

View reviewed changes

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

BilalG1 merged commit 793de9e into dev Nov 6, 2025
26 checks passed

BilalG1 deleted the fix-metrics-route-error branch November 6, 2025 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix usersByCountry query #1003

fix usersByCountry query #1003

Uh oh!

BilalG1 commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix usersByCountry query #1003

fix usersByCountry query #1003

Uh oh!

Conversation

BilalG1 commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BilalG1 commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Nov 6, 2025 •

edited

Loading

coderabbitai bot commented Nov 6, 2025 •

edited

Loading