Skip to content

Conversation

@lukesneeringer
Copy link
Contributor

Fixes #3152

@lukesneeringer lukesneeringer self-assigned this Aug 8, 2017
@lukesneeringer lukesneeringer requested a review from dhermes August 8, 2017 21:51
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 8, 2017
@dhermes
Copy link
Contributor

dhermes commented Aug 8, 2017

@lukesneeringer So, system tests?

@tseaver tseaver added the api: datastore Issues related to the Datastore API. label Aug 10, 2017
@lukesneeringer
Copy link
Contributor Author

Going to take a pass on that for this one. It is a semi-edge case that I do not really want to own a system test for, and two separate customers have already confirmed that it works for them.

@dhermes
Copy link
Contributor

dhermes commented Aug 11, 2017

Can you provide a snippet that you've personally used to confirm it works with the backend? I'll check out this branch, run it myself, then we can merge?

@lukesneeringer
Copy link
Contributor Author

Not sure if I have the one I used a few months ago when I wrote this, but I should be able to write one. I will get back to you.

@dhermes
Copy link
Contributor

dhermes commented Aug 11, 2017

Thanks

@lukesneeringer
Copy link
Contributor Author

>>> from google.cloud import datastore
>>> client = datastore.Client()
>>> key = client.key('EntityKind', 1234)
>>> entity = datastore.Entity(key, exclude_from_indexes=('foo',))
>>> entity.foo = []
>>> client.put(entity)
>>>

@dhermes
Copy link
Contributor

dhermes commented Aug 11, 2017

On it.

@dhermes
Copy link
Contributor

dhermes commented Aug 11, 2017

@lukesneeringer entity.foo = [] is not what you want. You want entity['foo'] = []. (i.e. you're holding it wrong).

I'm not sure this fixes the issue? Notice after storing, the property isn't actually added:

>>> from google.cloud import datastore
>>> client = datastore.Client()
>>> key = client.key('EntityKind', 1234)
>>> client.delete(key)
>>> client.get(key) is None
True
>>> entity = datastore.Entity(key, exclude_from_indexes=('foo',))
>>> entity['foo'] = []
>>> entity
<Entity('EntityKind', 1234) {'foo': []}>
>>> entity.exclude_from_indexes
{'foo'}
>>> client.put(entity)
>>> 
>>> retrieved = client.get(key)
>>> retrieved
<Entity('EntityKind', 1234) {}>
>>> retrieved.exclude_from_indexes
set()

@lukesneeringer
Copy link
Contributor Author

Odd that two customers said it worked for them.

It almost looks like the current behavior just ignores the empty list entirely rather than actually store an empty list in Datastore.

@dhermes
Copy link
Contributor

dhermes commented Aug 11, 2017

@lukesneeringer I'm fairly certain they are using a different client to actually store the data / use our low-level gRPC client to send the empty list. I'll try to do that right now.

@dhermes
Copy link
Contributor

dhermes commented Aug 11, 2017

@lukesneeringer It's unclear to me what exclude_from_indexes even means for an empty list. The exclude property is set on VALUES within an ArrayValue, e.g. if entity['foo'] = ['a']:

>>> mutation
upsert {
  key {
    partition_id {
      project_id: "PROJECT"
    }
    path {
      kind: "EntityKind"
      id: 1234
    }
  }
  properties {
    key: "foo"
    value {
      array_value {
        values {
          string_value: "a"
          exclude_from_indexes: true
        }
      }
    }
  }
}

@lukesneeringer
Copy link
Contributor Author

@dhermes That might be why the fix is correct then? The problem is that it was raising an exception on an empty list (because programmatically the users were unsure whether the list was empty or not).

@lukesneeringer
Copy link
Contributor Author

@alercunha @MeteKem Can you elucidate this for us?

@dhermes
Copy link
Contributor

dhermes commented Aug 11, 2017

@lukesneeringer Gotcha. Have you run your snippet against master? It fails?

@alercunha
Copy link

alercunha commented Aug 12, 2017

@lukesneeringer @dhermes In my case the issue manifested when retrieving empty array values from DS which was set by another client (Java in my case). The actual value stored was an empty array.

@metekemertas
Copy link

@lukesneeringer @dhermes @alercunha Same here. In my case the other client was a node client.

array_pb = array_val_pb.array_value.values

unindexed_value_pb1 = array_pb.add()
unindexed_value_pb1.integer_value = 10

This comment was marked as spam.

This comment was marked as spam.

@alercunha
Copy link

alercunha commented Aug 12, 2017

If I create an entity using this library and set a certain property as an empty array there is no value stored (it's like the property was never added). However in other client libraries that's not always the case.

So for example if I create data doing this:

    ds = datastore.Client()

    key = ds.key('Test')
    entity = datastore.Entity(key=key)
    entity['children'] = []
    ds.put(entity)

    key = ds.key('Test')
    entity = datastore.Entity(key=key)
    entity['children'] = ['A']
    ds.put(entity)

From the console it will look like this:

image

The first entity has no property children defined.
Querying the data will work just fine.

However if I edit the entity manually and set a property with an empty array like this:

image

Then back to the list of entities now the first entity does have a children property with an empty value in it.

image

Now if I query the data I get the error mentioned earlier.
I tried saving this empty array using just python but couldn't do it.

This bug seems to be caused by other libraries saving data in a way that this library wouldn't do. So I'm not sure within this library context it will be possible to reproduce the problem in a unit test.

However it should still be considered a bug since the Datastore accepts it when coming from other libraries as a valid value.

@alercunha
Copy link

alercunha commented Aug 12, 2017

@lukesneeringer you're right, the current code ignores a key-pair value if the value is an empty array: https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/datastore/google/cloud/datastore/helpers.py#L214

I tried removing this behavior but then I get a 400 error from the google API call.

@dhermes
Copy link
Contributor

dhermes commented Aug 12, 2017

This bug seems to be caused by other libraries saving data in a way that this library wouldn't do. So I'm not sure within this library context it will be possible to reproduce the problem in a unit test.

If it's possible to do at all, we can still do it with raw protocol buffers.

@metekemertas
Copy link

Any updates? I'm still struggling with my development cycle because of this.

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2017

@MeteKem Can you help us with the reproduction of the issue? For an offending entity, can you retrieve it with the low-level API? (I'll produce a snippet shortly.)

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2017

>>> from google.cloud.proto.datastore.v1 import datastore_pb2
>>> from google.cloud import datastore
>>>
>>> client = datastore.Client()
>>> key = client.key('Foo', 'Bar')
>>>
>>> key_pb = key.to_protobuf()
>>> read_options = datastore_pb2.ReadOptions()
>>> gapic_client = client._datastore_api
>>> lookup_response = gapic_client.lookup(
...     client.project, read_options, [key_pb])
>>>
>>> lookup_response
missing {
  entity {
    key {
      partition_id {
        project_id: "${PROJECT}"
      }
      path {
        kind: "Foo"
        name: "Bar"
      }
    }
  }
  version: 1
}

>>>

@metekemertas
Copy link

@dhermes To reproduce the issue:

  1. Create an entity with an empty array value for a given field using the node client: https://github.com/GoogleCloudPlatform/google-cloud-node
  2. Try to retrieve that entity using this Python client.

I did confirm that the low-level API can retrieve the offending entity. It was clear to me that the entity can be retrieved with the low-level API anyways. It's just the conversion of entity_pb2.Entity to google.cloud.datastore.entity.Entity that's causing the issue.

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2017

Create an entity with an empty array value for a given field using the node client

Can you provide a JS snippet?

I did confirm that the low-level API can retrieve the offending entity....It's just the conversion of entity_pb2.Entity to google.cloud.datastore.entity.Entity that's causing the issue.

Sorry I didn't communicate clearly. I was interested in you doing that so we could use the returned protobuf to reproduce this error, not as validation that the API could store / send the value.

In that vein, can you share the returned protobuf from the low-level call?

@metekemertas
Copy link

metekemertas commented Aug 17, 2017

@dhermes It looks something like this. I had to strip it down as the original entity includes proprietary data that I cannot publicly share.

{
  entity {
    key {
      partition_id {
        project_id: "<your_project_id_here>"
      }
      path {
        kind: "Foo"
        id: "Bar"
      }
    }
    properties {
      key: "some_key"
      value {
        entity_value {
          properties {
            key: "some_attributes"
            value {
              array_value {
              }
            }
          }
          properties {
            key: "emails"
            value {
              array_value {
              }
            }
          }
        }
      }
    }
  }
  version: 1502065220849000
}

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2017

Great, so the offenders seem to be:

value {
  array_value {
  }
}

@jgrabenstein
Copy link

Just hit this as well, hoping for fix soon. In my case it's also due to them being created by a node.js client, and me trying to query in python.

@lukesneeringer
Copy link
Contributor Author

@jgrabenstein Definitely hoping to get a fix soon. As you can see, this one has been quite a struggle. :-)

@dhermes
Copy link
Contributor

dhermes commented Aug 23, 2017

@lukesneeringer I'm working a repro right now.

@dhermes
Copy link
Contributor

dhermes commented Aug 23, 2017

Setting up an environment:

$ virtualenv venv
$ venv/bin/pip install ipython google-cloud-datastore
$ venv/bin/pip freeze | grep google-cloud
gapic-google-cloud-datastore-v1==0.15.3
google-cloud-core==0.26.0
google-cloud-datastore==1.2.0
proto-google-cloud-datastore-v1==0.90.4

and running

$ venv/bin/ipython
Python 3.6.2 (default, Aug  4 2017, 20:24:57)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

we have

In [1]: from google.cloud import datastore

In [2]: from google.cloud.proto.datastore.v1 import datastore_pb2

In [3]: from google.cloud.proto.datastore.v1 import entity_pb2

In [4]: client = datastore.Client()

In [5]: key = client.key('Foo', 'Bar')

In [6]: entity = datastore.Entity(key=key)

In [7]: batch = client.batch()

In [8]: batch.begin()

In [9]: batch.put(entity)

In [10]: mutations = batch.mutations

In [11]: len(mutations)
Out[11]: 1

In [12]: mutation = mutations[0]

In [13]: mutation
Out[13]:
upsert {
  key {
    partition_id {
      project_id: "prahj-ekt"
    }
    path {
      kind: "Foo"
      name: "Bar"
    }
  }
}

In [14]: value_pb = mutation.upsert.properties.get_or_create('name-name')

In [15]: value_pb.array_value.CopyFrom(entity_pb2.ArrayValue(values=[]))

In [16]: mutation
Out[16]:
upsert {
  key {
    partition_id {
      project_id: "prahj-ekt"
    }
    path {
      kind: "Foo"
      name: "Bar"
    }
  }
  properties {
    key: "name-name"
    value {
      array_value {
      }
    }
  }
}

In [17]: gapic_api = client._datastore_api

In [18]: gapic_api
Out[18]: <google.cloud.datastore._gax.GAPICDatastoreAPI at 0x7f56310b9588>

In [19]: mode = datastore_pb2.CommitRequest.NON_TRANSACTIONAL

In [20]: commit_response_pb = gapic_api.commit(
    ...:     batch.project, mode, mutations, transaction=None)
    ...:

In [21]: commit_response_pb
Out[21]:
mutation_results {
  version: 1503505990610000
}
index_updates: 1

In [22]: key_pbs = [key.to_protobuf()]

In [23]: read_options = datastore_pb2.ReadOptions()

In [24]: lookup_response = gapic_api.lookup(
    ...:     client.project, read_options, key_pbs)
    ...:

In [25]: lookup_response
Out[25]:
found {
  entity {
    key {
      partition_id {
        project_id: "prahj-ekt"
      }
      path {
        kind: "Foo"
        name: "Bar"
      }
    }
    properties {
      key: "name-name"
      value {
        array_value {
        }
      }
    }
  }
  version: 1503505990610000
}

In [26]: %xmode plain
Exception reporting mode: Plain

In [27]: client.get(key)
Traceback (most recent call last):
  File "<ipython-input-27-0924c86e3e51>", line 1, in <module>
    client.get(key)
  File ".../venv/lib/python3.6/site-packages/google/cloud/datastore/client.py", line 309, in get
    deferred=deferred, transaction=transaction)
  File ".../venv/lib/python3.6/site-packages/google/cloud/datastore/client.py", line 370, in get_multi
    for entity_pb in entity_pbs]
  File ".../venv/lib/python3.6/site-packages/google/cloud/datastore/client.py", line 370, in <listcomp>
    for entity_pb in entity_pbs]
  File ".../venv/lib/python3.6/site-packages/google/cloud/datastore/helpers.py", line 140, in entity_from_protobuf
    raise ValueError('For an array_value, subvalues must either '
ValueError: For an array_value, subvalues must either all be indexed or all excluded from indexes.


In [28]:

@dhermes
Copy link
Contributor

dhermes commented Aug 23, 2017

@lukesneeringer You can add the above as a system test for the explicit purpose of guarding against regression? Running it on this branch:

In [27]: client.get(key)
Out[27]: <Entity('Foo', 'Bar') {'name-name': []}>

In [28]: client.delete(key)

In [29]: 

@jgrabenstein
Copy link

bump.

@metekemertas
Copy link

@dhermes Any updates?

@jgrabenstein
Copy link

Would be nice to see some movement on this

@kirubakaran
Copy link

kirubakaran commented Oct 9, 2017

@dhermes @lukesneeringer Any updates?

PS: This change fixed the issue for me as well. Thanks @lukesneeringer :-)

array_pb = array_val_pb.array_value.values

unindexed_value_pb1 = array_pb.add()
unindexed_value_pb1.integer_value = 10

This comment was marked as spam.

@david-gang
Copy link
Contributor

Hi,
I want to use this library and this PR is important for me.
What is the expected time when this issue gets merged?
I have an automatic build which downloads the pip package and deploys the docker image, so it is hard for me to change the code for every version.
Is there something i can contribute here?

Thanks

@zero-master
Copy link

I ran into this. In my case, Ruby and Java clients set array empty.

@chemelnucfin chemelnucfin added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Jan 16, 2018
@tseaver tseaver removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Feb 20, 2018
@tseaver
Copy link
Contributor

tseaver commented Feb 22, 2018

Superseded by #4915.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: datastore Issues related to the Datastore API. cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.