Skip to content

feat: session recording DB schema, S3 utils, batch endpoint#1184

Closed
BilalG1 wants to merge 5 commits intodevfrom
analytics-replays-1
Closed

feat: session recording DB schema, S3 utils, batch endpoint#1184
BilalG1 wants to merge 5 commits intodevfrom
analytics-replays-1

Conversation

@BilalG1
Copy link
Copy Markdown
Collaborator

@BilalG1 BilalG1 commented Feb 11, 2026

Summary

  • Add SessionRecording and SessionRecordingChunk Prisma models with migration
  • Add S3 uploadBytes, downloadBytes, and getS3PublicUrl utilities
  • Add POST /session-recordings/batch endpoint for client recording uploads
  • Add session recording seed data

Stack: PR 1/4devanalytics-replays-1analytics-replays-2analytics-replays-3analytics-replays-4

Test plan

  • Verify Prisma migration applies cleanly
  • Test batch upload endpoint with sample recording data
  • Verify S3 upload/download round-trip

Summary by CodeRabbit

  • New Features

    • Session Recordings: users can upload session event batches with automatic deduplication; uploads are stored securely and include metadata (user, timestamps, duration). Query performance improved for filtering by user and time.
    • Demo data: additional dummy session recordings seeded for local/dev environments.
  • Chores

    • Added support for a private cloud storage bucket and server-side upload/download utilities; local/dev containers updated to include a private bucket.

…ad endpoint

- Add SessionRecording and SessionRecordingChunk Prisma models
- Add migration for session_recordings_mvp
- Add seed data for session recordings
- Add S3 uploadBytes, downloadBytes, and getS3PublicUrl utilities
- Add POST /session-recordings/batch endpoint for client uploads
@vercel
Copy link
Copy Markdown

vercel bot commented Feb 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-backend Ready Ready Preview, Comment Feb 12, 2026 3:19am
stack-dashboard Ready Ready Preview, Comment Feb 12, 2026 3:19am
stack-demo Ready Ready Preview, Comment Feb 12, 2026 3:19am
stack-docs Ready Ready Preview, Comment Feb 12, 2026 3:19am

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

Adds an MVP for session recordings: database migrations and Prisma models, a batch upload API that validates/deduplicates and stores gzipped payloads to S3, S3 upload/download helpers, seed data generation, and local/dev docker/env updates for a private S3 bucket.

Changes

Cohort / File(s) Summary
Database migration & Prisma schema
apps/backend/prisma/migrations/20260210120000_session_recordings_mvp/migration.sql, apps/backend/prisma/schema.prisma
Adds SessionRecording and SessionRecordingChunk tables/models, composite keys and FK relations to Tenancy/ProjectUser, and indexes/unique constraints for chunk deduplication and timestamp queries.
API: batch upload handler
apps/backend/src/app/api/latest/session-recordings/batch/route.tsx
New POST route validating auth and payload limits, computes event times, upserts session metadata, deduplicates batches, gzips and uploads payloads to S3, creates chunk records, and returns s3_key/deduped metadata.
S3 utilities
apps/backend/src/s3.tsx
Adds uploadBytes and downloadBytes with bucket selection (public/private), body reading helper, and error handling for missing config.
Seeding
apps/backend/prisma/seed.ts
Adds SessionRecordingSeedOptions and seedDummySessionRecordings, invoked from dummy project seed to create randomized session recordings (targeted count).
Env & Docker (local/dev)
apps/backend/.env, apps/backend/.env.development, docker/dependencies/docker.compose.yaml, docker/emulator/docker.compose.yaml, docker/server/.env
Adds STACK_S3_PRIVATE_BUCKET and initializes a private S3 mock bucket (stack-storage-private) for private object storage.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as POST /session-recordings/batch
    participant Prisma as Database
    participant S3 as AWS S3

    Client->>API: POST {session_id, batch_id, tab_id, events, started_at_ms, sent_at_ms}
    API->>API: Validate auth, refreshTokenId, size limits, and events
    API->>API: Compute firstMs/lastMs from events (fallback to sent_at_ms)
    API->>Prisma: Read existing SessionRecording (by tenancyId + session_id)
    alt existing session
        Prisma-->>API: session record
        API->>Prisma: Upsert updated startedAt/lastEventAt
    else new session
        API->>Prisma: Create SessionRecording
    end
    API->>Prisma: Check SessionRecordingChunk unique (tenancyId, sessionRecordingId, batchId)
    alt chunk exists
        Prisma-->>API: existing chunk with s3Key
        API-->>Client: {session_id, batch_id, s3_key, deduped: true}
    else new chunk
        API->>API: Create gzipped payload
        API->>S3: uploadBytes(gzipped payload) -> s3Key
        S3-->>API: s3Key
        API->>Prisma: Insert SessionRecordingChunk (metadata)
        alt unique constraint violation
            Prisma-->>API: duplicate error
            API-->>Client: {session_id, batch_id, s3_key, deduped: true}
        else success
            Prisma-->>API: chunk created
            API-->>Client: {session_id, batch_id, s3_key, deduped: false}
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Upgrade Prisma to v7 #1064: Edits Prisma generator/config and import paths in apps/backend/prisma, which may intersect with the new models and schema changes here.
  • seed script dummy project #1018: Previously extended dummy-project seeding flow in apps/backend/prisma/seed.ts; directly related to the new session recording seeding wiring.
  • Add S3 bucket #816: Modified S3 integration and dev/docker S3 configuration; relevant to apps/backend/src/s3.tsx and added STACK_S3_PRIVATE_BUCKET.

Suggested reviewers

  • N2D4

Poem

🐰
I hopped through bytes and gzipped song,
Batches bundled, never wrong,
Chunks tucked safe in S3's night,
Deduped trails in moonlight bright,
Hoppy logs — recordings hum along.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main changes: adding session recording DB schema, S3 utilities, and a batch endpoint for the MVP feature.
Description check ✅ Passed The description provides a clear summary of changes with specific files/features affected, includes a test plan with checkboxes, and contextualizes this as PR 1/4 in a stack.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch analytics-replays-1

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
apps/backend/src/s3.tsx (1)

73-98: Add comments explaining the as any casts.

The function uses several as any casts to duck-type the S3 response body. Per coding guidelines, each any usage should have a comment explaining why it's needed. Here the body is typed as unknown and needs runtime type-narrowing through duck-typing, which the type system can't express cleanly.

♻️ Suggested comments
   // Web ReadableStream (some runtimes)
-  if (typeof body === "object" && body !== null && "transformToByteArray" in body && typeof (body as any).transformToByteArray === "function") {
-    return (body as any).transformToByteArray();
+  // `as any` needed: body is `unknown` and we're duck-typing the AWS SDK's SdkStreamMixin which exposes transformToByteArray
+  if (typeof body === "object" && body !== null && "transformToByteArray" in body && typeof (body as any).transformToByteArray === "function") {
+    return (body as any).transformToByteArray();
   }

   // Node.js Readable or any AsyncIterable<Uint8Array>
-  if (typeof body === "object" && body !== null && Symbol.asyncIterator in (body as any)) {
+  // `as any` needed: TypeScript cannot narrow `unknown` to AsyncIterable via `in` check alone
+  if (typeof body === "object" && body !== null && Symbol.asyncIterator in (body as any)) {
     const chunks: Buffer[] = [];
-    for await (const chunk of body as any) {
+    for await (const chunk of body as AsyncIterable<Uint8Array>) {

As per coding guidelines: "Try to avoid the any type. When using any, leave a comment explaining why it's being used."


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@apps/backend/src/app/api/latest/session-recordings/batch/route.tsx`:
- Around line 104-127: The current flow reads with
prisma.sessionRecording.findUnique then computes newStartedAtMs/newLastEventAtMs
and calls prisma.sessionRecording.upsert, which opens a TOCTOU race that can
overwrite concurrent updates; replace this two-step logic with a single atomic
upsert using raw SQL (e.g., an INSERT ... ON CONFLICT ... DO UPDATE) that sets
startedAt = LEAST(existing.startedAt, EXCLUDED.startedAt) and lastEventAt =
GREATEST(existing.lastEventAt, EXCLUDED.lastEventAt) so startedAt/lastEventAt
bounds are merged atomically (remove the findUnique and compute-ms logic around
newStartedAtMs/newLastEventAtMs and use the SQL LEAST/GREATEST approach in place
of prisma upsert).
- Around line 79-81: The MAX_BODY_BYTES check currently runs after buffering
(fullReq.bodyBuffer) which allows req.arrayBuffer() in createSmartRouteHandler
to consume large bodies; update the flow to validate size before any buffering
by checking the Content-Length header (or an upstream middleware limit) and
returning StatusError.PayloadTooLarge if it exceeds MAX_BODY_BYTES, or move the
MAX_BODY_BYTES check into createSmartRouteHandler so it verifies Content-Length
and rejects early before calling req.arrayBuffer(); reference MAX_BODY_BYTES,
fullReq.bodyBuffer, createSmartRouteHandler and req.arrayBuffer() when making
the change.
🧹 Nitpick comments (5)
apps/backend/prisma/migrations/20260210120000_session_recordings_mvp/migration.sql (1)

52-53: Unique index name is misleading — it omits tenancyId from the name despite including it in the columns.

The index SessionRecordingChunk_sessionRecordingId_batchId_key actually covers (tenancyId, sessionRecordingId, batchId). Consider renaming it to SessionRecordingChunk_tenancyId_sessionRecordingId_batchId_key to match the Prisma-generated naming convention and avoid confusion during debugging.

Suggested rename
-CREATE UNIQUE INDEX "SessionRecordingChunk_sessionRecordingId_batchId_key"
+CREATE UNIQUE INDEX "SessionRecordingChunk_tenancyId_sessionRecordingId_batchId_key"
   ON "SessionRecordingChunk"("tenancyId", "sessionRecordingId", "batchId");
apps/backend/prisma/schema.prisma (1)

283-305: refreshTokenId has no FK constraint to ProjectUserRefreshToken.

refreshTokenId is stored as a plain UUID without a foreign key relation. If a refresh token is revoked or deleted, this becomes a dangling reference. If this is intentional (to preserve recording metadata after token cleanup), a brief comment explaining the design choice would be helpful for future readers.

apps/backend/src/s3.tsx (1)

63-88: Add comments explaining the as any casts per coding guidelines.

The duck-typing checks here require as any because the AWS SDK v3 Body type is StreamingBlobPayloadOutputTypes | undefined, a complex union that doesn't expose transformToByteArray or Symbol.asyncIterator in its static type. A short comment on each cast would satisfy the project's guideline. As per coding guidelines: "Try to avoid the any type. When using any, leave a comment explaining why it's being used."

Suggested comments
   // Web ReadableStream (some runtimes)
+  // `any` cast: AWS SDK v3 Body is a complex union; `transformToByteArray` is a SdkStreamMixin method not visible in all type overlaps.
   if (typeof body === "object" && body !== null && "transformToByteArray" in body && typeof (body as any).transformToByteArray === "function") {
     return (body as any).transformToByteArray();
   }
 
   // Node.js Readable or any AsyncIterable<Uint8Array>
+  // `any` cast: Symbol.asyncIterator is not in the static Body type union from AWS SDK.
   if (typeof body === "object" && body !== null && Symbol.asyncIterator in (body as any)) {
     const chunks: Buffer[] = [];
     for await (const chunk of body as any) {
apps/backend/prisma/seed.ts (1)

1819-1819: Prefer ?? throwErr(...) over the non-null assertion.

Even though userIds is verified non-empty on line 1804, the coding guidelines prefer defensive coding over !.

Suggested fix
-    const projectUserId = userIds[Math.floor(Math.random() * userIds.length)]!;
+    const projectUserId = userIds[Math.floor(Math.random() * userIds.length)] ?? throwErr('userIds index out of bounds');

As per coding guidelines: "Code defensively. Prefer ?? throwErr(...) over non-null assertions with good error messages explicitly stating the assumption that must've been violated."

apps/backend/src/app/api/latest/session-recordings/batch/route.tsx (1)

22-29: Add a comment explaining the as any cast.

e is typed as unknown from the unknown[] parameter, so the cast is unavoidable, but the coding guidelines ask for a comment. As per coding guidelines: "Try to avoid the any type. When using any, leave a comment explaining why it's being used."

Suggested fix
     if (!("timestamp" in e)) continue;
+    // `any` cast: events are untyped rrweb payloads; we duck-type check for `timestamp` above.
     const ts = (e as any).timestamp;

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

This PR introduces initial “session recording” support by adding SessionRecording / SessionRecordingChunk Prisma models + migration, a new S3 helper layer (uploadBytes, downloadBytes, getS3PublicUrl), and a SmartRouteHandler endpoint (POST /session-recordings/batch) that accepts rrweb event batches, gzips them, uploads to S3, and stores per-batch metadata in Postgres. The seed script is also extended to generate dummy session recording rows for the dummy project.

The endpoint’s flow is: validate auth/body → upsert SessionRecording timestamps → dedupe by (tenancyId, sessionRecordingId, batchId) → gzip+upload to S3 → create SessionRecordingChunk row.

Confidence Score: 3/5

  • This PR is mostly safe to merge, but the batch upload endpoint has a real storage-leak failure mode that should be fixed first.
  • Core schema and S3 utilities are straightforward, but POST /session-recordings/batch uploads to S3 before writing the chunk row; any DB failure after upload leaves orphaned objects. That’s a correctness/operational issue under realistic transient DB errors.
  • apps/backend/src/app/api/latest/session-recordings/batch/route.tsx

Important Files Changed

Filename Overview
apps/backend/prisma/migrations/20260210120000_session_recordings_mvp/migration.sql Adds SessionRecording/SessionRecordingChunk tables plus FKs and indexes for session recording metadata.
apps/backend/prisma/schema.prisma Introduces Prisma models SessionRecording and SessionRecordingChunk and wires relations from Tenancy/ProjectUser.
apps/backend/prisma/seed.ts Adds dummy session recording seed generator and calls it during dummy project seeding.
apps/backend/src/app/api/latest/session-recordings/batch/route.tsx Adds SmartRouteHandler POST endpoint to gzip+upload rrweb batches to S3 and record chunk metadata; currently can orphan S3 objects if DB insert fails after upload.
apps/backend/src/s3.tsx Adds uploadBytes/downloadBytes helpers and internal body-to-bytes reader on top of existing S3 client config.

Sequence Diagram

sequenceDiagram
  participant Client as SDK Client
  participant API as POST /api/latest/session-recordings/batch
  participant Prisma as Prisma (tenant DB)
  participant S3 as S3 Bucket

  Client->>API: batch upload {session_id,batch_id,events...}
  API->>API: validate auth + size + events
  API->>Prisma: upsert SessionRecording (tenancyId, sessionId)
  API->>Prisma: findUnique SessionRecordingChunk by (tenancyId, sessionId, batchId)
  alt chunk exists
    Prisma-->>API: existing s3Key
    API-->>Client: 200 {deduped:true, s3_key}
  else new chunk
    API->>API: gzip(JSON payload)
    API->>S3: PutObject(s3Key, gzipped bytes)
    API->>Prisma: create SessionRecordingChunk
    Prisma-->>API: created
    API-->>Client: 200 {deduped:false, s3_key}
  end
Loading

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@BilalG1 BilalG1 closed this Feb 12, 2026
@BilalG1 BilalG1 deleted the analytics-replays-1 branch February 13, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant