Skip to content

[Python SDK] use iter_lines to read HTTP response.#1798

Merged
abhizer merged 1 commit intoforeach_chunkfrom
iter_lines
May 28, 2024
Merged

[Python SDK] use iter_lines to read HTTP response.#1798
abhizer merged 1 commit intoforeach_chunkfrom
iter_lines

Conversation

@ryzhyk
Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk commented May 28, 2024

Use iter_lines to read HTTP response line-by-line. This way we don't need to worry about incomplete chunks. The previous implementation also ran out of memory on large outputs. I did not figure out what was going on exactly, but it used up 20GB of RAM while parsing a few thousand records. This implementation does not seem to have that problem.

Is this a user-visible change (yes/no): ___

@ryzhyk ryzhyk requested a review from abhizer May 28, 2024 00:44
@abhizer
Copy link
Copy Markdown
Contributor

abhizer commented May 28, 2024

I wanted to switch to iter_lines, but I saw people talking about this issue related to it and wanted to test it first.
I think if we check if the line returned is not empty before we call json.loads(), we should be good.

Use `iter_lines` to read HTTP response line-by-line.  This way we don't
need to worry about incomplete chunks.  The previous implementation also
ran out of memory on large outputs.  I did not figure out what was
going on exactly, but it used up 20GB of RAM while parsing a few
thousand records.  This implementation does not seem to have that
problem.

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
@ryzhyk
Copy link
Copy Markdown
Contributor Author

ryzhyk commented May 28, 2024

What kind of self-respecting software uses \r\n for a newline? I hope Feldera will never fall so low :)
There was however a performance issue in iter_lines, which should be fixed in the latest pushed version.

@abhizer abhizer merged commit c500be5 into foreach_chunk May 28, 2024
@abhizer abhizer deleted the iter_lines branch May 28, 2024 07:05
gz pushed a commit that referenced this pull request May 28, 2024
Use `iter_lines` to read HTTP response line-by-line.  This way we don't
need to worry about incomplete chunks.  The previous implementation also
ran out of memory on large outputs.  I did not figure out what was
going on exactly, but it used up 20GB of RAM while parsing a few
thousand records.  This implementation does not seem to have that
problem.

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Co-authored-by: Leonid Ryzhyk <leonid@feldera.com>
gz pushed a commit that referenced this pull request May 28, 2024
Use `iter_lines` to read HTTP response line-by-line.  This way we don't
need to worry about incomplete chunks.  The previous implementation also
ran out of memory on large outputs.  I did not figure out what was
going on exactly, but it used up 20GB of RAM while parsing a few
thousand records.  This implementation does not seem to have that
problem.

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Co-authored-by: Leonid Ryzhyk <leonid@feldera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants