httpclient: support googlesource #5536

ethomson · 2020-06-01T23:28:18Z

After the great smart http refactoring of 2020, we no longer support googlesource. It turns out that google code - by default - sends very large data packets (65520) which is very neat to the upper limit of a packet size for a git data packet over the smart protocols. This illustrates some problems in our handling of very large data packets.

In particular, we read a block of data from the client and try to put it in the output buffer. When the output buffer (for data packets, this is a buffer of 65536) is nearly full, we do not take that into account and do a read of a block, despite the fact that we cannot fit it into the caller's output buffer. We then discard what cannot be sent back.

While debugging this, I found two other problems as well. This PR fixes the root cause and these other two issues.

First, this adds an integration test that clones (a small repository) from googlesource that illustrates our deficiencies.
Next, we ensure that git_http_client_read_body will always return 0 at the end of an http response. If a buffer size that is being read into is very near the number of bytes sent by the remote server, then we may be in a position where we have read all the content in the http response, and the only bytes remaining are part of the http metadata, like the zero-length chunk signifier of the end of the stream. In that case, we do not actually return content bytes, but should signify that we have finished reading the stream. Identify this by the on_message_complete callback and return 0 to the caller.
Next, we only read at most what the client has requested. git_http_client_read_body takes a buffer size that should be respected when reading from the stream. Without this, we would read data from the server, but not return it to the client, which would mean that it was lost.
Finally, we should clear the interim read buffer when starting a new request. The read buffer is used when we read data from the remote server that contains both headers and body content. The header data will be returned to the caller (in read_response) but the body data should be saved for a subsequent call to read_body. If such call never comes, we should clear the saved data, it should not be returned to callers for future requests.

Fixes #5525

Google Git (googlesource.com) behaves differently than git proper. Test that we can communicate with it.

When users call `git_http_client_read_body`, it should return 0 at the end of a message. When the `on_message_complete` callback is called, this will set `client->state` to `DONE`. In our read loop, we look for this condition and exit. Without this, when there is no data left except the end of message chunk (`0\r\n`) in the http stream, we would block by reading the three bytes off the stream but not making progress in any `on_body` callbacks. Listening to the `on_message_complete` callback allows us to stop trying to read from the socket when we've read the end of message chunk.

When `git_http_client_read_body` is invoked, it provides the size of the buffer that can be read into. This will be set as the parser context's `output_size` member. Use this as an upper limit on our reads, and ensure that we do not read more than the client requests.

The httpclient implementation keeps a `read_buf` that holds the data in the body of the response after the headers have been written. We store that data for subsequent calls to `git_http_client_read_body`. If we want to stop reading body data and send another request, we need to clear that cached data. Clear the cached body data on new requests, just like we read any outstanding data from the socket.

alexcrichton · 2020-06-02T18:15:09Z

Oh wow nice find! This looks to fix the Rust side of things that I was testing, thanks for this!

paolobarbolini · 2020-06-02T18:24:38Z

This seems to fix the issue I tested. Thanks!

pks-t

All of the fixes look sensible to me, thanks a lot for fixing these!

Pulls in a libgit2/libgit2#5536 to fix rust-lang/cargo#8258

ethomson added 4 commits June 1, 2020 22:15

online::clone: test a googlesource URL

b7bdb07

Google Git (googlesource.com) behaves differently than git proper. Test that we can communicate with it.

ethomson changed the title ~~httpclient: support google code~~ httpclient: support googlesource Jun 2, 2020

ethomson added the release-1.0.1 label Jun 2, 2020

ethomson mentioned this pull request Jun 2, 2020

Failure to clone repository when git CLI succeeds #5525

Closed

pks-t approved these changes Jun 3, 2020

View reviewed changes

pks-t merged commit 53a8f46 into master Jun 3, 2020

pks-t deleted the ethomson/http branch June 3, 2020 05:41

alexcrichton added a commit to alexcrichton/git2-rs that referenced this pull request Jun 3, 2020

Update libgit2 submodule to master branch

49dd9f0

Pulls in a libgit2/libgit2#5536 to fix rust-lang/cargo#8258

alexcrichton mentioned this pull request Jun 3, 2020

Update libgit2 submodule to master branch rust-lang/git2-rs#567

Merged

alexcrichton added a commit to rust-lang/git2-rs that referenced this pull request Jun 3, 2020

Update libgit2 submodule to master branch (#567)

622b6ee

Pulls in a libgit2/libgit2#5536 to fix rust-lang/cargo#8258

alexcrichton mentioned this pull request Jul 20, 2020

network zlib stream error confusion rust-lang/cargo#8517

Closed

SaschaMann mentioned this pull request Oct 28, 2020

GitError Class:Zlib on cloning General julia-actions/julia-buildpkg#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

httpclient: support googlesource #5536

httpclient: support googlesource #5536

Uh oh!

ethomson commented Jun 1, 2020 •

edited

Loading

Uh oh!

alexcrichton commented Jun 2, 2020

Uh oh!

paolobarbolini commented Jun 2, 2020

Uh oh!

pks-t left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

httpclient: support googlesource #5536

httpclient: support googlesource #5536

Uh oh!

Conversation

ethomson commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Jun 2, 2020

Uh oh!

paolobarbolini commented Jun 2, 2020

Uh oh!

pks-t left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ethomson commented Jun 1, 2020 •

edited

Loading