Skip to content
This repository was archived by the owner on Nov 12, 2025. It is now read-only.

Conversation

@tswast
Copy link
Contributor

@tswast tswast commented Aug 9, 2021

Will also require client changes, but I figure I'll start with a code sample with all the types I can think of so I better understand how the current (low-level) client works.

Just creating a draft PR so I don't lose track of this.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. samples Issues that are directly related to samples. labels Aug 9, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 9, 2021
@snippet-bot
Copy link

snippet-bot bot commented Aug 16, 2021

Here is the summary of changes.

You are about to add 2 region tags.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@tswast tswast changed the title WIP: write client sample docs: add code sample for using BigQueryWriteClient with a compiled proto2 module Aug 16, 2021
@tswast tswast marked this pull request as ready for review August 16, 2021 21:43
@tswast tswast requested a review from a team August 16, 2021 21:43
@tswast tswast requested review from a team as code owners August 16, 2021 21:43
@tswast tswast requested review from leahecole and shollyman August 16, 2021 21:43
Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sample mostly reinforces my perspective that no one should use the raw API directly, so many sharp edges. It's valuable info though. Should we give it a more severe snippet tag name? Alternately, maybe more details in the python comments that this demonstrates using the raw API directly?

row.timestamp_col = int(delta.total_seconds()) * 1000000 + int(delta.microseconds)
proto_rows.serialized_rows.append(row.SerializeToString())

# Since this is the second request, you only need to include the row data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should only need the row data, streamid/schema/traceid all get set on first request in the stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Omitting the stream name on subsequent requests worked fine.

proto_data.rows = proto_rows
request.proto_rows = proto_data

# Offset is optional, but can help detect when the server recieved a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want a sample where offset is optional, I'd consider either a committed or default stream. Pending implies you care about completeness, where offset is what you're sending and receiving to confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... The write API didn't seem to care when I omitted offset, but perhaps that's a server bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're omitting offset then you're just trusting that all rows land. offset ensures that rows are present where you expect them, and you have no gaps. You can do it, but commit semantics + maybe successful appends seems like a weird match.


syntax = "proto2";

message SampleData {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider commenting this. It also raises the question of how we can surface the proto content in the sample itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

Seeing as we'd need show folks how to run protoc and such I've kind of given up hope that this could be useful as a standalone sample. To do it right I think we'd need a proper tutorial, but perhaps that effort would be better spent on a manual client layer?

In the meantime this'll at least function as a system test. Plus it should be pretty easy to adapt without any changes to the sample code for my next step of seeing what happens when I try to write to a non-US table.

optional int64 sub_int_col = 1;
}

optional bool bool_col = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make one of the fields required? It mucks up your sample code a bit, but it gets the coverage across nullable/required/repeated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A required "row number" field or something could be useful for testing. I forget if the wire format actually changes for required / optional. I thought it's just a client-side validation check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's two variations: if the proto is required, and schema is optional then you can get client side errors. If the schema is required and proto is optional and unset, then you'll get an error from backend.


# Some stream types support an unbounded number of requests. Pass a
# generator or other iterable to the append_rows method to continuously
# write rows to the stream as requests are generated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the caveat here is if you're not interleaving reads/writes you can end up in a case where no more requests are accepted until you receive responses. I believe it's currently 1000, but that's an impl detail currently and not a formal part of the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I wonder if I should add a warning that it's highly recommended to read the responses and/or make the writes in a background thread? -- Plus I'll have to tell people not to use multiprocessing since gRPC hasn't really been playing nice with that.

@tswast
Copy link
Contributor Author

tswast commented Aug 17, 2021

Should we give it a more severe snippet tag name?

I'll add "raw" to the tag. This is not fine sushi, though.

@tswast
Copy link
Contributor Author

tswast commented Aug 17, 2021

@shollyman I think this is ready for another review pass

@tswast tswast requested a review from shollyman August 17, 2021 21:25
# write rows to the stream as requests are generated. Make sure to read
# from the response iterator as well so that the stream continues to flow.
requests = generate_sample_data(write_stream.name)
responses = write_client.append_rows(iter(requests))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out how to make this work in non-default regions. Turns out just one more line to this... I'll update this PR rather than make a separate one for non-US.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@plamut plamut changed the base branch from master to main August 25, 2021 10:09
@plamut plamut deleted the branch googleapis:main August 26, 2021 11:29
@plamut plamut closed this Aug 26, 2021
@plamut
Copy link
Contributor

plamut commented Aug 26, 2021

This PR was accidentally closed when main was deleted, and now I cannot re-open it even if I temporarily re-create main. Please open it again against master, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. cla: yes This human has signed the Contributor License Agreement. samples Issues that are directly related to samples.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants