-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Datastore: Add new public API methods for working with raw Entity Protobuf objects #8636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
raw Entity Protobuf objects: - get_entity_pb() -> entity_pb - get_multi_entity_pb() -> entity_pbs - put_entity_pb(entity_pb, ...) -> None - put_multi_entity_pb(entity_pbs, ...) -> None
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
|
I've signed an ICLA. P.S. CLA app doesn't directly support selecting a Google Account to use. I needed to manually add |
|
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
|
Anything else I can do to help move this forward? We've been using and running this code in our project for a while now without any issues. As mentioned above, I will add tests and clean up the docstrings, etc. as soon as I get "+1" on the actual API aka new method names. |
Protobuf objects.
NOTE: Documentation refernces "system_tests/data/index.yaml" file which doesn't exist.
|
I added docstrings in 1547a9f and system / integration tests in e56a275. I ran tests against the datastore emulator and verified they pass: (google-cloud-python) kami ~/w/google-cloud-python (git:datastore_entity_pb_public_api)$ ./datastore/.nox/system-2-7/bin/py.test -s -vvv datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods
==================================================================================== test session starts ====================================================================================
platform linux2 -- Python 2.7.16, pytest-4.6.4, py-1.8.0, pluggy-0.12.0 -- /home/kami/w/google-cloud-python/datastore/.nox/system-2-7/bin/python2.7
cachedir: .pytest_cache
rootdir: /home/kami/w/google-cloud-python/datastore
collected 3 items
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_fetch_entity_pbs PASSED
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_put_and_get_entity_pb PASSED
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_put_multi_get_multi_pb PASSED
===================================================================================== warnings summary ======================================================================================
tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_fetch_entity_pbs
/home/kami/w/google-cloud-python/datastore/google/cloud/datastore/_gapic.py:42: PendingDeprecationWarning: The `channel` argument is deprecated; use `transport` instead.
channel=channel, client_info=client._client_info
-- Docs: https://docs.pytest.org/en/latest/warnings.html
=========================================================================== 3 passed, 1 warnings in 0.43 seconds ============================================================================I also verified the tests pass against actual datastore instance using index file from 3902b02. On a related note, contributing documentation mentions (google-cloud-python) kami ~/w/google-cloud-python (git:datastore_entity_pb_public_api)$ ./datastore/.nox/system-2-7/bin/py.test -s -vvv datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods
==================================================================================== test session starts ====================================================================================
platform linux2 -- Python 2.7.16, pytest-4.6.4, py-1.8.0, pluggy-0.12.0 -- /home/kami/w/google-cloud-python/datastore/.nox/system-2-7/bin/python2.7
cachedir: .pytest_cache
rootdir: /home/kami/w/google-cloud-python/datastore
collected 3 items
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_fetch_entity_pbs PASSED
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_put_and_get_entity_pb st2-saas-prototype-dev
PASSED
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_put_multi_get_multi_pb PASSED
===================================================================================== warnings summary ======================================================================================
tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_fetch_entity_pbs
/home/kami/w/google-cloud-python/datastore/google/cloud/datastore/_gapic.py:42: PendingDeprecationWarning: The `channel` argument is deprecated; use `transport` instead.
channel=channel, client_info=client._client_info
-- Docs: https://docs.pytest.org/en/latest/warnings.html
=========================================================================== 3 passed, 1 warnings in 10.22 seconds ===========================================================================
It's also worth noting that other existing tests already exercise functionality provided by those new API methods since Entity Protobuf objects are used underneath and those new API methods just expose those to the user. To be able to get tests to work with the datastore emulator, I needed to make some changes / fixes - 4439177, c334a2b. I assume tests haven't been ran against the datastore emulator for a while and as such, they were broken. After making those changes / fixes, all tests but one now pass against the datastore emulator: ____________________________________________________________________ TestDatastoreQuery.test_query_offset_timestamp_keys ____________________________________________________________________
self = <tests.system.test_system.TestDatastoreQuery testMethod=test_query_offset_timestamp_keys>
def test_query_offset_timestamp_keys(self):
# See issue #4675
max_all = 10000
offset = 1
max_offset = max_all - offset
query = self.CLIENT.query(kind="timestamp_key")
all_w_limit = list(query.fetch(limit=max_all))
> self.assertEqual(len(all_w_limit), max_all)
E AssertionError: 1000 != 10000
datastore/tests/system/test_system.py:406: AssertionError
===================================================================================== warnings summary ======================================================================================
tests/system/test_system.py::TestDatastoreAllocateIDs::test_allocate_ids
tests/system/test_system.py::TestDatastoreQuery::test_ancestor_query
tests/system/test_system.py::TestDatastoreTransaction::test_empty_array_put
tests/system/test_system.py::TestDatastoreTransaction::test_failure_with_contention
/home/kami/w/google-cloud-python/datastore/google/cloud/datastore/_gapic.py:42: PendingDeprecationWarning: The `channel` argument is deprecated; use `transport` instead.
channel=channel, client_info=client._client_info
-- Docs: https://docs.pytest.org/en/latest/warnings.html
====================================================================== 1 failed, 27 passed, 4 warnings in 3.14 seconds ======================================================================I assume that's probably because of some kind datastore emulator limitation? But I didn't really have time to dig in. |
additional asserts. Datastore doesn't support such large number of items so we use a smaller number.
|
I pushed a fix / workaround for one test failing with the datastore emulator (mentioned in the comment above) - abc5373. (google-cloud-python) kami ~/w/google-cloud-python (git:datastore_entity_pb_public_api)$ ./datastore/.nox/system-2-7/bin/py.test -vvv datastore/tests/system/test_system.py
==================================================================================== test session starts ====================================================================================
platform linux2 -- Python 2.7.16, pytest-4.6.4, py-1.8.0, pluggy-0.12.0 -- /home/kami/w/google-cloud-python/datastore/.nox/system-2-7/bin/python2.7
cachedir: .pytest_cache
rootdir: /home/kami/w/google-cloud-python/datastore
collected 28 items
datastore/tests/system/test_system.py::TestDatastoreAllocateIDs::test_allocate_ids PASSED [ 3%]
datastore/tests/system/test_system.py::TestDatastoreSave::test_all_value_types PASSED [ 7%]
datastore/tests/system/test_system.py::TestDatastoreSave::test_empty_kind PASSED [ 10%]
datastore/tests/system/test_system.py::TestDatastoreSave::test_post_with_generated_id PASSED [ 14%]
datastore/tests/system/test_system.py::TestDatastoreSave::test_post_with_id PASSED [ 17%]
datastore/tests/system/test_system.py::TestDatastoreSave::test_post_with_name PASSED [ 21%]
datastore/tests/system/test_system.py::TestDatastoreSave::test_save_multiple PASSED [ 25%]
datastore/tests/system/test_system.py::TestDatastoreSaveKeys::test_save_key_self_reference PASSED [ 28%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_ancestor_query PASSED [ 32%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_limit_queries PASSED [ 35%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_ordered_query PASSED [ 39%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_projection_query PASSED [ 42%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_distinct_on PASSED [ 46%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_key_filter PASSED [ 50%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_multiple_filters PASSED [ 53%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_offset_timestamp_keys PASSED [ 57%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_paginate_simple_timestamp_keys PASSED [ 60%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_paginate_simple_uuid_keys PASSED [ 64%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_paginate_with_offset PASSED [ 67%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_paginate_with_start_cursor PASSED [ 71%]
datastore/tests/system/test_system.py::TestDatastoreQuery::test_query_simple_filter PASSED [ 75%]
datastore/tests/system/test_system.py::TestDatastoreTransaction::test_empty_array_put PASSED [ 78%]
datastore/tests/system/test_system.py::TestDatastoreTransaction::test_failure_with_contention PASSED [ 82%]
datastore/tests/system/test_system.py::TestDatastoreTransaction::test_transaction_via_explicit_begin_get_commit PASSED [ 85%]
datastore/tests/system/test_system.py::TestDatastoreTransaction::test_transaction_via_with_statement PASSED [ 89%]
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_fetch_entity_pbs PASSED [ 92%]
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_put_and_get_entity_pb PASSED [ 96%]
datastore/tests/system/test_system.py::TestDatastoreRawEntityPBMethods::test_put_multi_get_multi_pb PASSED [100%]
===================================================================================== warnings summary ======================================================================================
tests/system/test_system.py::TestDatastoreAllocateIDs::test_allocate_ids
tests/system/test_system.py::TestDatastoreQuery::test_ancestor_query
tests/system/test_system.py::TestDatastoreTransaction::test_empty_array_put
tests/system/test_system.py::TestDatastoreTransaction::test_failure_with_contention
/home/kami/w/google-cloud-python/datastore/google/cloud/datastore/_gapic.py:42: PendingDeprecationWarning: The `channel` argument is deprecated; use `transport` instead.
channel=channel, client_info=client._client_info
-- Docs: https://docs.pytest.org/en/latest/warnings.html
=========================================================================== 28 passed, 4 warnings in 5.57 seconds =========================================================================== |
Protobuf objects.
exception message and not just the type.
|
All right, I added more unit tests so code coverage checks / targets are now passing. As mentioned above, that code is already exercised as part of system / integration tests, but at the moment code coverage is only calculated for unit tests. In addition to those new tests, I also tried to "improve" some of the existing tests - I updated some tests to also assert on the exception error message and not just the class (asserting just on the class is not really robust and doesn't actually test for things we are interested in), etc. I tried to keep those commits separate. |
|
@tseaver and others - I added corresponding doc strings, test cases, etc. All the checks (lint, unit tests, system tests, etc.) also pass locally. If there is anything else I can do to help move this forward, please let me know. Thanks! |
|
Sorry to bother y'all again, but anything else I can do to help move this along? I see that a new test I added failed on Kokoro, but passed locally. I will dig in and try to fix it. |
…to datastore_entity_pb_public_api
|
@tseaver I pushed a change (a3d21b6) which fixes the failing test (order is not guaranteed, I updated the test so it doesn't rely on a specific order - keep in mind that existing tests don't even assert on the actual response, they just assert on the length) and fixed a typo and added some additional assert so code coverage is now 100% (bfd59ec). Can you please re-run Kokoro build with the latest changes? |
…le-cloud-python into datastore_entity_pb_public_api
|
And just to give some more context on how this can be used. Here is one of the examples which relies on that functionality - https://github.com/Kami/python-protobuf-cloud-datastore-entity-translator |
|
Just a friendly reminder if I can do anything else to help move this along? :) Thanks. |
|
|
||
| if is_key_partial: | ||
| insert_entity_pb = self._add_partial_key_entity_pb() | ||
| insert_entity_pb.CopyFrom(entity_pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to adjust _commit somehow to deal with the fact that it copies the key from the response copied into the non-pb entity list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand you correctly - you mean making sure self._partial_key_entities is also updated when put_entity_pb is used or to handle that similarly in here
| for new_key_pb, entity in zip(updated_keys, self._partial_key_entities): |
One option would be to have two lists - one which stores Python entity objects and the other one which stores EntityPB objects.
This way inside _commit we would iterate over both and assign keys on both type ob objects. I think this would be the most reasonable approach (other approaches would involve unnecessary conversions back and forth between Entity PB and Entity and this kinda defeats the purpose of the whole change).
Are you fine with that approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: It shows that it has been a while since I last worked on that change. I actually already implemented that in the past :)
Having said that, I can still push a change which makes it utilize two lists instead of one to make it a bit more cleaner. And I can also have a look if I can remove the unnecessary conversion between Entity and Entity PB in that commit function.
I will also go ahead and sync my branch with latest master.
Please let me know if there is anything else I'm missing with regards to _commit or anything else I need to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I synced my branch with latest master and pushed an improvement to avoid unnecessary conversion between Entity and Entity PB object - c298ac5.
While working on this change, I also noticed that the code for determining if Entity Protobuf key is partial wasn't fully correct for hierarchical keys, so I also pushed a fix for that and added multiple test cases which cover scenario for hierarchical keys.
In the end, I also decided to stick with a single list approach. Two lists approach would be more complicated and error prone since it means I would need to track mutations indexes / order for each list and handle them correctly inside commit function.
If I'm missing something else or there is something else I need to do, please let me know.
function and make sure we handle partial keys correctly for hierarchical / nested keys. Also add test cases which cover hierarchical keys.
|
@tseaver Can you give this another look and see if it's ready to have some CI ran on it? |
|
@tseaver @BenWhitehead Please let me know if there is anything else I can do to help move this along. (All the CI changes should be passing for months now, only one which is failing is " Thanks. |
|
Sorry I am a bit late to this. Is there a reason |
|
@crwilcox I belive that should be possible (need to double check the code to confirm), but it would likely also require refactoring a lot of tests since a lot of them use mocking. |
|
Ah. I feel like doing it that way is going to be better to maintain long term and the core path would be the same. I looked at some of the methods so there is a bit of refactoring needed for sure as the entity conversion is mingled with other code. |
|
@crwilcox I'm fine with that change. I also thought we will indeed be able to reduce some code duplication by using the underlying "entity_pb" version of the methods, but once I actually dug in, I noticed that's only partially true. Sadly, The reason for that is, the this method also needs to complete the key of the entity which is passed into the original method (either Python Entity class or the actual raw Entity Protobuf instance). The most efficient way to achieve that is to support working with both objects types internally inside I could also update that method to only work with EntityPb object (that's the approach I attempted first), but it means I still need code in So after digging In and trying to complete those changes I'm not totally sure it's worth it. We may perhaps save a few lines of code, but we still need to keep track and manipulate original objects which are passed to the top level "public" methods so I think refactoring those two methods would simple move the "complexity" elsewhere, but wouldn't really eliminate or reduce it. I think this should be fine as long as both versions of methods are covered well with tests (which should be the case with this PR). @crwilcox Please let me know what you think. I'll also shortly push one of the approaches I attempted - as described above, after actually digging in and trying multiple approaches, I don't think they actually save us a lot of reduce the code complexity in any substantial manner. After attempting it, I would perhaps even argue that it may save a couple of lines of code, but it adds more indirection which makes everything harder to follow / understand. EDIT: And here is the main part of the code I'm referring to - https://github.com/googleapis/google-cloud-python/pull/8636/files#diff-175120d59417e8e7fda1d3301adf03f6R284 No matter the approach we go with, this logic will need to live somewhere in some form. And if we go with the approach which updates "put" methods to use "_pb" versions of the methods internally, it means there will actually be more complexity in the end - since now we need to track which "entity_pb" object maps to which "entity pb" object so we can complete the keys on the original "entity" objects. |
This pull request implements the following methods for working with raw Entity Protobuf objects:
get_entity_pb()-> entity_pb2.Entityget_multi_entity_pb()-> list of entity_pb2.Entityput_entity_pb(entity_pb, ...)-> Noneput_multi_entity_pb(entity_pbs, ...)-> NoneQuery.fetch_entity_pb()-> Iterator returning entity_pb2.Entity instancesThose methods come handy when you want to work directly with the raw Entity Protobuf objects and not with the Entity convenience Python class wrapper (e.g. you care about the performance or you have an cross-programming language ORM which works with the raw Entity Protobuf objects, etc.).
For more context and background, please see https://github.com/googleapis/google-cloud-python/issues/8634.
Please keep in mind that the pull request is not finished yet (see TODO section below). I just opened it so we can continue the discussion with some code in place.TODO