initial elasticsearch-py instrumentation support, alternative implementation#191
initial elasticsearch-py instrumentation support, alternative implementation#191beniwohli wants to merge 5 commits intoelastic:masterfrom
Conversation
| params = kwargs.pop('params', {}) | ||
| cls_name, method_name = method.split('.', 1) | ||
| body_pos = (self.body_positions['all'].get(method_name) or | ||
| self.body_positions[self.version].get(method_name) or None) |
There was a problem hiding this comment.
continuation line over-indented for visual indent
|
|
||
| import elasticapm | ||
| from elasticapm.instrumentation.packages.base import AbstractInstrumentedModule | ||
| from elasticapm.utils import compat |
There was a problem hiding this comment.
'elasticapm.utils.compat' imported but unused
d663b91 to
f094f4f
Compare
|
One possible solution for the " But since the transport has a pool of connections, we'd need to be reasonably sure that if one connection is instrumented, all are, and I'm not sure we can make that assumption. |
eb5b6c9 to
fa0bfd2
Compare
fa0bfd2 to
f8d6f34
Compare
|
Just as an update to the above stated problem: since every |
| # user can see it. | ||
| if 'q' in params: | ||
| # 'q' is already encoded to a byte string at this point | ||
| query.append('q=' + params['q'].decode('utf-8')) |
There was a problem hiding this comment.
probably very edge case but do query params have to be utf-8 encoded?
There was a problem hiding this comment.
Yes, they are encoded to utf8 here:
https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/client/utils.py#L34-L38
There was a problem hiding this comment.
Only if it's not already encoded though - https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/client/utils.py#L29-L31
>>> elasticsearch.client.utils._escape(u"\U0001f3d6")
b'\xf0\x9f\x8f\x96'
>>> elasticsearch.client.utils._escape(u"\U0001f3d6".encode('utf-16'))
b'\xff\xfe<\xd8\xd6\xdf'
>>> elasticsearch.client.utils._escape(u"\U0001f3d6".encode('utf-16')).decode('utf-8')
UnicodeDecodeError
Not sure if this is worth a lot of effort to handle.
There was a problem hiding this comment.
ugh, right. I guess we at least can catch the UnicodeDecodeError if it happens
| context['db']['statement'] = '\n\n'.join(query) | ||
| if api_method == 'Elasticsearch.update': | ||
| if isinstance(body, dict) and 'script' in body: | ||
| context['db']['statement'] = json.dumps(body) |
There was a problem hiding this comment.
Why doesn't this grab just the script here? Seems care is taken to avoid capturing doc in the partial updated, but as is this would capture scripted upsert docs?
| if isinstance(body, dict) and 'query' in body: | ||
| query.append(json.dumps(body['query'])) | ||
| context['db']['statement'] = '\n\n'.join(query) | ||
| if api_method == 'Elasticsearch.update': |
| elif api_method == 'Elasticsearch.update': | ||
| if isinstance(body, dict) and 'script' in body: | ||
| # only get the `script` field from the body | ||
| context['db']['statement'] = json.dumps({'script': body['script']}) |
This adds instrumentation for an initial set of methods of the elasticsearch client library, as well as matrix tests for version 2, 5 and 6. This is an alternative implementation to elastic#169. It moves most of the work onto a wrapper of `Connection.perform_request`. This has several benefits: * we instrument everything by default, and can do more specific instrumentations for other some API methods (e.g. capture the query in the body for `search`) * we know which specific ES instance we connect to * if connections get retried, they appear as separate spans (more smartness, like storing retry information on the span will come later) There is also one draw back: if we manage to instrument the Elasticsearch object, but not the pooled Connection object, we could cause errors, because the former instrumentation adds stuff into the `params` dict that the latter instrumentation removes again. If the removal doesn't happen, things get messy. I haven't found a workaround for this yet.
This adds instrumentation for an initial set of methods of the elasticsearch client library, as well as matrix tests for version 2, 5 and 6. This is an alternative implementation to #169. It moves most of the work onto a wrapper of `Connection.perform_request`. This has several benefits: * we instrument everything by default, and can do more specific instrumentations for other some API methods (e.g. capture the query in the body for `search`) * we know which specific ES instance we connect to * if connections get retried, they appear as separate spans (more smartness, like storing retry information on the span will come later) closes #191
This adds instrumentation for an initial set of methods of the
elasticsearch client library, as well as matrix tests for version
2, 5 and 6.
This is an alternative implementation to #169. It moves most of the
work onto a wrapper of
Connection.perform_request. This has severalbenefits:
for other some API methods (e.g. capture the query in the body for
search)storing retry information on the span will come later)
There is also one draw back: if we manage to instrument the Elasticsearch object,
but not the pooled Connection object, we could cause errors, because the former
instrumentation adds stuff into the
paramsdict that the latter instrumentationremoves again. If the removal doesn't happen, things get messy. I haven't found
a workaround for this yet.