[Python SDK] use iter_lines to read HTTP response. by ryzhyk · Pull Request #1798 · feldera/feldera

ryzhyk · 2024-05-28T00:44:19Z

Use iter_lines to read HTTP response line-by-line. This way we don't need to worry about incomplete chunks. The previous implementation also ran out of memory on large outputs. I did not figure out what was going on exactly, but it used up 20GB of RAM while parsing a few thousand records. This implementation does not seem to have that problem.

Is this a user-visible change (yes/no): ___

abhizer · 2024-05-28T04:34:47Z

I wanted to switch to iter_lines, but I saw people talking about this issue related to it and wanted to test it first.
I think if we check if the line returned is not empty before we call json.loads(), we should be good.

Use `iter_lines` to read HTTP response line-by-line. This way we don't need to worry about incomplete chunks. The previous implementation also ran out of memory on large outputs. I did not figure out what was going on exactly, but it used up 20GB of RAM while parsing a few thousand records. This implementation does not seem to have that problem. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>

ryzhyk · 2024-05-28T05:09:00Z

What kind of self-respecting software uses \r\n for a newline? I hope Feldera will never fall so low :)
There was however a performance issue in iter_lines, which should be fixed in the latest pushed version.

Use `iter_lines` to read HTTP response line-by-line. This way we don't need to worry about incomplete chunks. The previous implementation also ran out of memory on large outputs. I did not figure out what was going on exactly, but it used up 20GB of RAM while parsing a few thousand records. This implementation does not seem to have that problem. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Co-authored-by: Leonid Ryzhyk <leonid@feldera.com>

ryzhyk requested a review from abhizer May 28, 2024 00:44

mihaibudiu approved these changes May 28, 2024

View reviewed changes

ryzhyk force-pushed the iter_lines branch from 1431f77 to 50857b7 Compare May 28, 2024 05:03

abhizer approved these changes May 28, 2024

View reviewed changes

abhizer merged commit c500be5 into foreach_chunk May 28, 2024

abhizer deleted the iter_lines branch May 28, 2024 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python SDK] use iter_lines to read HTTP response.#1798

[Python SDK] use iter_lines to read HTTP response.#1798
abhizer merged 1 commit intoforeach_chunkfrom
iter_lines

ryzhyk commented May 28, 2024

Uh oh!

abhizer commented May 28, 2024

Uh oh!

ryzhyk commented May 28, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ryzhyk commented May 28, 2024

Uh oh!

abhizer commented May 28, 2024

Uh oh!

ryzhyk commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ryzhyk commented May 28, 2024 •

edited

Loading