-
Notifications
You must be signed in to change notification settings - Fork 45
docs: add code sample for using BigQueryWriteClient with a compiled proto2 module
#269
Conversation
|
Here is the summary of changes. You are about to add 2 region tags.
This comment is generated by snippet-bot.
|
BigQueryWriteClient with a compiled proto2 module
shollyman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sample mostly reinforces my perspective that no one should use the raw API directly, so many sharp edges. It's valuable info though. Should we give it a more severe snippet tag name? Alternately, maybe more details in the python comments that this demonstrates using the raw API directly?
| row.timestamp_col = int(delta.total_seconds()) * 1000000 + int(delta.microseconds) | ||
| proto_rows.serialized_rows.append(row.SerializeToString()) | ||
|
|
||
| # Since this is the second request, you only need to include the row data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should only need the row data, streamid/schema/traceid all get set on first request in the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Omitting the stream name on subsequent requests worked fine.
| proto_data.rows = proto_rows | ||
| request.proto_rows = proto_data | ||
|
|
||
| # Offset is optional, but can help detect when the server recieved a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want a sample where offset is optional, I'd consider either a committed or default stream. Pending implies you care about completeness, where offset is what you're sending and receiving to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... The write API didn't seem to care when I omitted offset, but perhaps that's a server bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're omitting offset then you're just trusting that all rows land. offset ensures that rows are present where you expect them, and you have no gaps. You can do it, but commit semantics + maybe successful appends seems like a weird match.
|
|
||
| syntax = "proto2"; | ||
|
|
||
| message SampleData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider commenting this. It also raises the question of how we can surface the proto content in the sample itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments.
Seeing as we'd need show folks how to run protoc and such I've kind of given up hope that this could be useful as a standalone sample. To do it right I think we'd need a proper tutorial, but perhaps that effort would be better spent on a manual client layer?
In the meantime this'll at least function as a system test. Plus it should be pretty easy to adapt without any changes to the sample code for my next step of seeing what happens when I try to write to a non-US table.
| optional int64 sub_int_col = 1; | ||
| } | ||
|
|
||
| optional bool bool_col = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make one of the fields required? It mucks up your sample code a bit, but it gets the coverage across nullable/required/repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A required "row number" field or something could be useful for testing. I forget if the wire format actually changes for required / optional. I thought it's just a client-side validation check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's two variations: if the proto is required, and schema is optional then you can get client side errors. If the schema is required and proto is optional and unset, then you'll get an error from backend.
|
|
||
| # Some stream types support an unbounded number of requests. Pass a | ||
| # generator or other iterable to the append_rows method to continuously | ||
| # write rows to the stream as requests are generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the caveat here is if you're not interleaving reads/writes you can end up in a case where no more requests are accepted until you receive responses. I believe it's currently 1000, but that's an impl detail currently and not a formal part of the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I wonder if I should add a warning that it's highly recommended to read the responses and/or make the writes in a background thread? -- Plus I'll have to tell people not to use multiprocessing since gRPC hasn't really been playing nice with that.
I'll add "raw" to the tag. This is not fine sushi, though. |
|
@shollyman I think this is ready for another review pass |
| # write rows to the stream as requests are generated. Make sure to read | ||
| # from the response iterator as well so that the stream continues to flow. | ||
| requests = generate_sample_data(write_stream.name) | ||
| responses = write_client.append_rows(iter(requests)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured out how to make this work in non-default regions. Turns out just one more line to this... I'll update this PR rather than make a separate one for non-US.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
|
This PR was accidentally closed when |
Will also require client changes, but I figure I'll start with a code sample with all the types I can think of so I better understand how the current (low-level) client works.
Just creating a draft PR so I don't lose track of this.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕