-
Notifications
You must be signed in to change notification settings - Fork 500
fix usersByCountry query #1003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix usersByCountry query #1003
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughA sampling cap is introduced for country metric calculations, along with a scaling factor to estimate accurate country user counts when the total exceeds the sample size. Country aggregation logic is reworked to handle null entries and filter invalid results accordingly. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Optimized loadUsersByCountry to handle large user bases by sampling max 10,000 users and scaling results proportionally.
Key changes:
- Added
MAX_USERS_FOR_COUNTRY_SAMPLEconstant (10,000 users) - Query now counts total users first, then samples ordered subset
- Introduced scaling factor calculation to estimate full population metrics from sample
- Improved null filtering with TypeScript type guards for safer array operations
Impact:
This prevents performance issues when querying country data for projects with large user bases by limiting database queries while maintaining reasonable accuracy through statistical sampling.
Confidence Score: 4/5
- This PR is safe to merge with minimal risk - straightforward performance optimization with sound sampling approach
- The changes implement a well-structured performance optimization using statistical sampling. The logic is sound: limiting queries to 10K users with deterministic ordering and scaling results proportionally. However, confidence is 4/5 rather than 5/5 because sampling introduces statistical estimation which may be less accurate for certain data distributions, particularly when country distribution is non-uniform across user IDs
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| apps/backend/src/app/api/latest/internal/metrics/route.tsx | 4/5 | Added performance optimization to sample max 10K users for country metrics with scaling estimation, improved null filtering logic |
Sequence Diagram
sequenceDiagram
participant Client
participant RouteHandler
participant loadUsersByCountry
participant Prisma
participant GlobalPrisma
Client->>RouteHandler: GET /api/latest/internal/metrics
RouteHandler->>loadUsersByCountry: Request country data (cached)
loadUsersByCountry->>Prisma: Count total users
Prisma-->>loadUsersByCountry: totalUsers count
loadUsersByCountry->>Prisma: Find sample users (max 10K, ordered)
Prisma-->>loadUsersByCountry: Sample user IDs
loadUsersByCountry->>loadUsersByCountry: Calculate scaling factor
loadUsersByCountry->>GlobalPrisma: Query country data for sample
GlobalPrisma-->>loadUsersByCountry: Raw country counts
loadUsersByCountry->>loadUsersByCountry: Apply scaling factor to estimates
loadUsersByCountry-->>RouteHandler: Scaled country statistics
RouteHandler-->>Client: Complete metrics response
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
apps/backend/src/app/api/latest/internal/metrics/route.tsx (2)
36-41: Extra query added for scaling calculation.The
totalUserscount query is necessary to calculate the scaling factor. While this adds an extra database round-trip, the performance impact should be minimal, and the approach is clear and maintainable.If optimization becomes necessary, consider combining the count with the fetch using a window function or CTE, though the current approach prioritizes clarity.
48-49: Document the deterministic sampling trade-off.The
orderBy: { projectUserId: "asc" }approach ensures consistent sampling (beneficial for caching), but introduces bias by always selecting the earliest users. If geographic distribution has shifted over time, this could underestimate or miss countries where users joined more recently.Consider one of these approaches:
Add documentation explaining the trade-off between cache consistency and sampling accuracy.
Use hash-based sampling for deterministic yet more representative results:
// Example: Sample ~10% deterministically across all user IDs const samplingPercentage = Math.min(100, Math.ceil((MAX_USERS_FOR_COUNTRY_SAMPLE / totalUsers) * 100)); const users = await prisma.$queryRaw` SELECT "projectUserId" FROM "ProjectUser" WHERE "tenancyId" = ${tenancy.id}::UUID AND ${includeAnonymous} OR "isAnonymous" = false AND MOD(ABS(HASHTEXT("projectUserId"::text)), 100) < ${samplingPercentage} LIMIT ${MAX_USERS_FOR_COUNTRY_SAMPLE} `;This maintains determinism while avoiding chronological bias.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/backend/src/app/api/latest/internal/metrics/route.tsx(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: check_prisma_migrations (22.x)
- GitHub Check: lint_and_build (latest)
- GitHub Check: all-good
- GitHub Check: build (22.x)
- GitHub Check: build (22.x)
- GitHub Check: build (22.x)
- GitHub Check: docker
- GitHub Check: restart-dev-and-test
- GitHub Check: setup-tests
- GitHub Check: restart-dev-and-test-with-custom-base-port
🔇 Additional comments (3)
apps/backend/src/app/api/latest/internal/metrics/route.tsx (3)
17-17: LGTM: Sampling cap defined.The constant is well-named and the 10,000 sample size is reasonable for preventing performance issues while maintaining adequate statistical representation.
58-58: LGTM: Scaling factor calculation is correct.The logic correctly scales up country counts when sampling is used, and defaults to 1 (no scaling) when the full dataset is processed.
83-91: LGTM: Result construction correctly handles null values and scaling.The code properly:
- Guards against null
countryCode(defensive programming despite SQL WHERE clause)- Applies the scaling factor with appropriate rounding
- Uses a type guard to filter null entries
Note on sampling limitations:
Countries with very few users may not appear in the sample at all and will be completely absent from results. This is an inherent limitation of the sampling approach. Consider adding a comment or documentation noting that small countries may be underrepresented or missing when sampling is active.
Summary by CodeRabbit