Skip to content

Conversation

@ethomson
Copy link
Member

@ethomson ethomson commented Jun 1, 2020

After the great smart http refactoring of 2020, we no longer support googlesource. It turns out that google code - by default - sends very large data packets (65520) which is very neat to the upper limit of a packet size for a git data packet over the smart protocols. This illustrates some problems in our handling of very large data packets.

In particular, we read a block of data from the client and try to put it in the output buffer. When the output buffer (for data packets, this is a buffer of 65536) is nearly full, we do not take that into account and do a read of a block, despite the fact that we cannot fit it into the caller's output buffer. We then discard what cannot be sent back.

While debugging this, I found two other problems as well. This PR fixes the root cause and these other two issues.

  • First, this adds an integration test that clones (a small repository) from googlesource that illustrates our deficiencies.
  • Next, we ensure that git_http_client_read_body will always return 0 at the end of an http response. If a buffer size that is being read into is very near the number of bytes sent by the remote server, then we may be in a position where we have read all the content in the http response, and the only bytes remaining are part of the http metadata, like the zero-length chunk signifier of the end of the stream. In that case, we do not actually return content bytes, but should signify that we have finished reading the stream. Identify this by the on_message_complete callback and return 0 to the caller.
  • Next, we only read at most what the client has requested. git_http_client_read_body takes a buffer size that should be respected when reading from the stream. Without this, we would read data from the server, but not return it to the client, which would mean that it was lost.
  • Finally, we should clear the interim read buffer when starting a new request. The read buffer is used when we read data from the remote server that contains both headers and body content. The header data will be returned to the caller (in read_response) but the body data should be saved for a subsequent call to read_body. If such call never comes, we should clear the saved data, it should not be returned to callers for future requests.

Fixes #5525

ethomson added 4 commits June 1, 2020 22:15
Google Git (googlesource.com) behaves differently than git proper.
Test that we can communicate with it.
When users call `git_http_client_read_body`, it should return 0 at the
end of a message.  When the `on_message_complete` callback is called,
this will set `client->state` to `DONE`.  In our read loop, we look for
this condition and exit.

Without this, when there is no data left except the end of message chunk
(`0\r\n`) in the http stream, we would block by reading the three bytes
off the stream but not making progress in any `on_body` callbacks.
Listening to the `on_message_complete` callback allows us to stop trying
to read from the socket when we've read the end of message chunk.
When `git_http_client_read_body` is invoked, it provides the size of the
buffer that can be read into.  This will be set as the parser context's
`output_size` member.  Use this as an upper limit on our reads, and
ensure that we do not read more than the client requests.
The httpclient implementation keeps a `read_buf` that holds the data
in the body of the response after the headers have been written.  We
store that data for subsequent calls to `git_http_client_read_body`.  If
we want to stop reading body data and send another request, we need to
clear that cached data.

Clear the cached body data on new requests, just like we read any
outstanding data from the socket.
@ethomson ethomson changed the title httpclient: support google code httpclient: support googlesource Jun 2, 2020
@alexcrichton
Copy link
Contributor

Oh wow nice find! This looks to fix the Rust side of things that I was testing, thanks for this!

@paolobarbolini
Copy link

This seems to fix the issue I tested. Thanks!

Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the fixes look sensible to me, thanks a lot for fixing these!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failure to clone repository when git CLI succeeds

4 participants