Skip to content

Add JsonStreamResponse for memory-efficient JSON streaming#19367

Open
dereuromark wants to merge 15 commits into5.nextfrom
5.next-json-stream-response
Open

Add JsonStreamResponse for memory-efficient JSON streaming#19367
dereuromark wants to merge 15 commits into5.nextfrom
5.next-json-stream-response

Conversation

@dereuromark
Copy link
Copy Markdown
Member

@dereuromark dereuromark commented Mar 27, 2026

Summary

Implements #19356 - adds a new JsonStreamResponse class for memory-efficient streaming of large JSON datasets using generators.

Key Features

  • True streaming: Uses echo + flush() to send data to the client as each item is encoded, keeping memory usage constant regardless of dataset size
  • Dual format support: Standard JSON arrays and NDJSON (newline-delimited JSON)
  • Envelope/wrapper structures: Support for root key wrapping and full envelope with static metadata
  • Transform callbacks: Custom transformation of items before encoding
  • Graceful error handling: Three-layer strategy with pre-validation, mid-stream error markers, and server-side logging
  • Debug mode: Pretty-printed output when Configure::read('debug') is true
  • PSR-7 immutability: withStreamOptions() returns new instance

Memory Profile

With true streaming, memory usage stays constant regardless of dataset size:

Scenario Traditional JSON JsonStreamResponse
10,000 rows @ 1KB each ~10MB in memory ~1KB at a time
100,000 rows @ 1KB each ~100MB in memory ~1KB at a time
Time to first byte After ALL rows processed After FIRST row

The implementation uses echo + flush() to send data to the client immediately as each item is encoded, rather than building a complete string in memory first.

Architecture

Class Location

Cake\Http\Response\JsonStreamResponse extending Cake\Http\Response

Options

Option Type Default Description
root string|null null Wrap data in {"root": [...]}
envelope array [] Static metadata merged with streaming data
dataKey string 'data' Key for streaming data when envelope is used
format string 'json' 'json' or 'ndjson'
transform callable|null null Transform each item before encoding
flags int DEFAULT_JSON_FLAGS JSON encode flags

Usage Examples

// Simple array streaming
return new JsonStreamResponse($query);
// Output: [{...}, {...}, ...]

// With root wrapper
return new JsonStreamResponse($query, ['root' => 'articles']);
// Output: {"articles": [{...}, {...}]}

// With envelope
return new JsonStreamResponse($query, [
    'envelope' => ['meta' => ['total' => 5000, 'page' => 1]],
    'dataKey' => 'articles',
]);
// Output: {"meta": {"total": 5000, "page": 1}, "articles": [{...}, {...}]}

// NDJSON format
return new JsonStreamResponse($query, ['format' => 'ndjson']);
// Output: {"id": 1, ...}\n{"id": 2, ...}\n...

// With transform
return new JsonStreamResponse($query, [
    'transform' => fn($article) => ['id' => $article->id, 'title' => $article->title],
]);

Error Handling Strategy

  1. Pre-validation: First item encoded before output starts - if it fails, proper 500 response possible
  2. Mid-stream error marker: Later failures output {"__streamError": {"message": "...", "index": N}} to maintain valid JSON
  3. Server-side logging: All encoding failures logged via Log::error()

Content-Type Headers

Format Content-Type
json application/json; charset=UTF-8
ndjson application/x-ndjson; charset=UTF-8

Additionally, X-Accel-Buffering: no is set to prevent nginx proxy buffering.

ORM Integration

For true streaming benefits, use unbuffered queries and avoid result formatters:

// Good - streams one row at a time
$query = $this->Articles->find()->bufferResults(false);
return new JsonStreamResponse($query);

// Avoid - formatters like map(), combine() buffer results internally
$query = $this->Articles->find()->map(fn($row) => $row); // Breaks streaming

Note: Result formatters (map(), combine(), etc.) buffer results internally, which defeats the memory-efficient streaming purpose.

@dereuromark dereuromark added enhancement needs squashing The pull request should be squashed before merging labels Mar 27, 2026
@dereuromark dereuromark added this to the 5.4.0 milestone Mar 27, 2026
@dereuromark dereuromark marked this pull request as ready for review March 27, 2026 03:13
- Changed from string concatenation to echo-based output
- Added outputAndFlush() for per-item flushing to client
- Added X-Accel-Buffering: no header for nginx compatibility
- Smart flush detection to work correctly in test environments
- Memory usage now O(1) instead of O(n) for output
- Add FORMAT_JSON and FORMAT_NDJSON constants with validation
- Change createStreamCallback return type from callable to Closure
- Make Log dependency optional with class_exists check
- Add tests for invalid format and constants
Comment on lines +460 to +465
$level = ob_get_level();
if ($level <= 1) {
if ($level === 1) {
ob_flush();
}
flush();
Copy link
Copy Markdown
Contributor

@josbeir josbeir Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we considering adding a threshold to only flush buffers every x lines, ob is pretty cpu intensive and can become a little bottleck for large feeds...

but maybe that is a micro-optimization :-)

protected int $flushThreshold = 50;

// only flush after x n of rows
protected function outputAndFlush(string $data, bool $force = false): void
{
    echo $data;
    $this->rowsSinceLastFlush++;

    if ($force || $this->rowsSinceLastFlush >= $this->flushThreshold) {
        $this->flushBuffers();
        $this->rowsSinceLastFlush = 0;
    }
}

// flush after threshold
protected function flushBuffers(): void
{
.... flush here
}

@dereuromark dereuromark requested a review from markstory March 28, 2026 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement needs squashing The pull request should be squashed before merging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants