Add paramstyle transformer#276
Conversation
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> squash with two commits back Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
This commit updates the test fixtures so that we automatically run the NATIVE tests with both paramstyles. For parameter features that are new in version 3.0.0 (DbsqlParameter) we don't run with both parameter styles because there is no need for syntax backwards-compatibility when using the new DbsqlParameter class. i.e. Nobody should ever use DbsqlParameter with a ParamStyle.PYFORMAT query Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
We no longer need a separate 2 tests for doubles, as they are handled in the new _eq() function. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
…ormer Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
|
Question: why %(var)s? Will it always be |
src/databricks/sql/utils.py
Outdated
|
|
||
| _output_operation = operation | ||
|
|
||
| PYFORMAT_PARAMSTYLE_REGEX = r"%\((\w+)\)s" |
There was a problem hiding this comment.
is this too restrictive? Can you link to pep or whatever that describes this? In particular I'm wondering if %( thing )s is allowed.
There was a problem hiding this comment.
This is the specification: https://peps.python.org/pep-0249/#paramstyle
It's ambiguous as to what format codes are expected to work. But in reality, since we're doing string interpolation under the covers, it doesn't matter so much what PEP-249 requires. Because if customers have queries that depend on this behaviour their queries will break if we don't do something to cover the use case.
|
Oh good question. I picked that final query = "SELECT * FROM users WHERE user_id = %(user_id)d"
parameters = {"user_id": 1234}
cursor.execute(query, parameters)I can add an e2e test for this and revise the regex to capture-and-replace any valid format specifier. I wonder just how complex could be the format specifiers customers are using...depending on the edge-cases it could make sense to use a different approach for this. What I'm thinking is that rather than regexing for the parameter marker, we could simply render it inline and then do |
src/databricks/sql/utils.py
Outdated
|
|
||
| _output_operation = operation | ||
|
|
||
| PYFORMAT_PARAMSTYLE_REGEX = r"%\((\w+)\)s" |
There was a problem hiding this comment.
do we have an existing regex for inline replacement today, can we reuse that logic?
There was a problem hiding this comment.
Nope. We're just doing string interpolation so Python handles it for us, in C.
Have a look at inject_parameters to see exactly what it does:
databricks-sql-python/src/databricks/sql/utils.py
Lines 386 to 387 in 0e7ce57
There was a problem hiding this comment.
can we call to inject_parameters which replace parameters with the native format of parameter
There was a problem hiding this comment.
That's what transform_paramstyle() already does...
|
This is the complete format Mini-Language specification.. For simplicity, I like the approach of simply rendering the |
So, the only examples I could find all use %()s. Even for things like dates. Now technically, because you were just doing python string interpolation before, people might have used crazier things than that, but I don't think we have any obligation to support things outside of the norm. If they are doing crazy things, the rendered value could be unexpected (consider if they passed you date formatting for example). I think the simple regex approach is fine. |
|
after this change if user use :param style query, it will work right? |
Yes, but only when the native parameter approach is used. It will fail at DBR when |
Works for me! |
I'm skeptical that this is possible without having exactly as complicated a regex. Consider that a date format could lead to the word August being rendered. In order to know that August is what you're looking for, you would need to extract their formatting code. But if you have a regex that can extract their formatting code, you may as well just use that for finding and replacing without rendering the pyformat first. Willing to be proven wrong :P. |
Well you can't ask me to prove you wrong and not expect me to try 😛 def transform_paramstyle(operation: str, parameters: Dict[str, Any]) -> str:
COMPLETE_PYFORMAT_REGEX = r'%\((\w+)\)([diouxXeEfFgGcrsa%])'
INNER_NAME_REGEX = r'\((.*?)\)'
NAMED_PARAMSTYLE_FMT = ":{}"
pat = re.compile(COMPLETE_PYFORMAT_REGEX)
inner_pat = re.compile(INNER_NAME_REGEX)
pyformat_markers = pat.findall(operation)
for marker in pyformat_markers:
_name = re.findall(inner_pat, marker)[0]
rendered_value = inject_parameters(_name, parameters.get(_name))
operation = operation.replace(rendered_value, NAMED_PARAMSTYLE_FMT.format(_name))
return operation |
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
| "Consider using native parameters." in caplog.text | ||
| ), "Log message should not be supressed" | ||
|
|
||
| @both_approaches |
There was a problem hiding this comment.
Why are we removing all of these?
There was a problem hiding this comment.
It's just replaced with an extra dimension to the matrix, as described in the PR description.
databricks-sql-python/tests/e2e/test_parameterized_queries.py
Lines 52 to 56 in a86abd3
- Tests that use DbsqlParam will not run under both paramstyles If a user is writing code with DbsqlParam then they *must* use the named param style - Likewise, if parameters are passed as a sequence then transform_paramstyle will not be called. - Adds tests for the inline ordinal parameter approach Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
|
I pushed some new e2e tests of the inline ordinal parameter approach. And updated the logic that calls As a consequence, One side-effect of this is that users cannot pass a |
transform_paramstyle now needs a dictionary of parameters since it doesn't use regex anymore Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
… not recognised by DBR. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Background
The goal of this PR is to make it so that queries that include parameter markers in the older
pyformatparamstyle as defined in PEP-249 can still run with connector >= 3.0.0 whenuse_inline_params=False. Without this change, any existing parameterised queries would not execute at all because they use a parameter marker syntax that DBR doesn't understand.Thefore it's desirable that query like this:
can be executed with the native parameter approach. We do this by automatically converting
pyformatparameter markers intonamedformat markers. So the above query would be executed as:Description
There are a few significant changes in this PR:
transform_paramstylewhich performs a regex looking for namedpyformatparameter markers in a query string and replaces them withnamedmarkers. If there are nopyformatmarkers, the query text is returned as-is. There are unit tests for this behaviour.transform_paramstyleis called on the query text whenever parameters are passed tocursor.execute()anduse_inline_params=False.Most of the changes are to the e2e tests. Those changes are in short:
Define a 3-dimension test matrix for the roundtrip parameterisation tests
The components of the matrix are:
When running with the inline approach, only
pyformatparamstyle is used.Refactor roundtrip tests to use
pytest.mark.parametrizeThis makes the test-code easier to read. And also protects against unintended differences between assertions for different types.
This means that these tests:
test_dbsqlparam_inferred_nonetest_dbsqlparam_inferred_booltest_dbsqlparam_inferred_integertest_dbsqlparam_inferred_doubletest_dbsqlparam_inferred_datetest_dbsqlparam_inferred_timestamptest_dbsqlparam_inferred_stringtest_dbsqlparam_inferred_decimalAre replaced with a single, parameterised test:
test_dbsql_param_with_inferrenceDitto for the primitive and custom decimal type tests.
Up next
We'll switch the default parameter approach from inline to native and document all of this in README.parameters.md.