fix(sdk/python): make static annotation analysis alias-aware#13162
Open
grouville wants to merge 12 commits into
Open
fix(sdk/python): make static annotation analysis alias-aware#13162grouville wants to merge 12 commits into
grouville wants to merge 12 commits into
Conversation
The property-based module generator had grown by appending more cases directly inside `_type_expr`. That made the recursive grammar harder to review: leaf selection, wrapper rendering, and metadata propagation all lived in one long dispatch function, so a reviewer had to re-read the whole function to tell whether a change was only a refactor or whether it expanded the generated language. Refactor the type-expression grammar without adding new shapes. The generated surface remains the existing set of base types, quoted forward references, `Optional[T]`, `T | None`, `list[T]`, and `Annotated[T, ...]`. The implementation extracts small helpers for the source-rendering operations: base leaves, forward-reference leaves, optional wrappers, pipe-union wrappers, list wrappers, and annotated wrappers. `_type_leaf` now owns leaf selection and `_type_rules` declares the recursive wrapper choices as data. `_type_expr` continues to drive the same bounded recursive grammar, but it reads as a composition of grammar rules instead of repeatedly constructing `GeneratedType` objects inline. This prepares the generator for additional edge cases in later commits while keeping this commit reviewable on its own: it should be behavior-preserving, and the property test should still pass before any analyzer fix is added. Verified with: * `uv run ruff check tests/mod/_strategies.py` * `uv run pytest tests/mod/test_property.py` Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
The property grammar already emitted module-level aliases, but those aliases
were mostly declarations in the generated source. Function annotations were
still generated from the underlying type grammar directly, so many alias
resolution paths were not exercised by the differential test.
Make aliases real leaves in the generated type-expression grammar. Alias
definitions now keep the generated target metadata alongside the rendered
source, and `_type_expr` can choose those aliases anywhere it can choose a base
leaf. That means Hypothesis can now produce compound annotations such as
`Optional[Alias]`, `list[Alias]`, `Annotated[Alias, ...]`, and return types that
refer to aliases.
Generate both implicit assignments (`Alias = T`) and explicit PEP 613 aliases
(`Alias: TypeAlias = T`) because the analyzer's collection path supports both
and the runtime treats both as valid module patterns. PEP 695 aliases keep the
same generated-target metadata and are also exposed as leaves on Python
versions that can parse that syntax.
For example, the generator can now emit source like:
```python
from typing import Annotated, Optional, TypeAlias
AliasA: TypeAlias = str
@dagger.object_type
class Foo:
@dagger.function
def greet(self, name: Optional[AliasA]) -> AliasA:
...
```
Top-level alias targets continue to avoid `Annotated[...]` for now. Nested
`Annotated` shapes still appear in generated function annotations, but an alias
whose right-hand side is `Annotated[...]` has separate metadata-flattening
semantics that are not part of this alias-expansion fix.
Verified with:
* `uv run ruff check tests/mod/_strategies.py`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
Add `Self` to the generated annotation grammar as a leaf that resolves to the
enclosing `Foo` object. The runtime introspection side already understands
`typing_extensions.Self`, and the AST analyzer has a dedicated resolver path
for it, but the property generator could previously miss regressions there
because it only generated the literal forward reference `"Foo"`.
The generator now imports `Self` and may place it anywhere a nested leaf is
valid. For example:
```python
from typing_extensions import Self
@dagger.object_type
class Foo:
@dagger.function
def chain(self, next: Optional[Self] = None) -> Self:
...
```
`Self` is only selected below the top level, matching the existing forward
reference guard that avoids producing invalid top-level string annotations in
contexts where Python would evaluate them too early.
Verified with:
* `uv run ruff check tests/mod/_strategies.py`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
Broaden the property grammar to cover the method shapes that users can write on
Dagger objects. Until now every generated function was an instance method with
`self` as the first parameter, so the differential test did not exercise the
analyzer's handling of decorator stacks and receiver stripping for
`@staticmethod` and `@classmethod`.
Generated functions now carry a `method_kind` and render one of these shapes:
```python
@dagger.function
def instance(self, value: str) -> str:
...
@staticmethod
@dagger.function
def static(value: str) -> str:
...
@classmethod
@dagger.function
def from_cls(cls, value: str) -> str:
...
```
The generator biases toward instance methods so the common case remains common,
but static and class methods appear often enough for Hypothesis to catch
regressions in first-parameter handling and decorator recognition.
Verified with:
* `uv run ruff check tests/mod/_strategies.py`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
Broaden the property grammar to cover modules that import the `dagger` package
under another name. The analyzer has alias-aware decorator and field detection,
but the generated modules previously always used the plain `dagger` name, so
the differential test did not exercise that import path.
The generator now sometimes renders both `import dagger` and
`import dagger as dgr` or `import dagger as dgmod`. The plain import remains
available because type annotations are still generated with the literal
`dagger.Directory` and `dagger.Container` spelling. The alias is used for the
Dagger decorators and helper field calls.
For example, Hypothesis can now emit source like:
```python
import dagger
import dagger as dgr
@dgr.object_type
class Foo:
@staticmethod
@dgr.function
def build(src: dagger.Directory) -> dagger.Container:
...
```
This is separate from the alias-expansion parser fix. It only increases the
runtime/analyzer conformance surface for import alias handling.
Verified with:
* `uv run ruff check tests/mod/_strategies.py`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
Broaden the property grammar to cover class-level `dagger.field(default=...)`
declarations on the generated `Foo` object. Fields use a different analyzer
path from function parameters: they are class-body annotated assignments, can
carry defaults through `dagger.field`, and are discovered before methods are
parsed. The property test should exercise that surface instead of only
generating functions.
The generated fields use primitive annotations with JSON-serializable literal
defaults. For example:
```python
@dagger.object_type
class Foo:
enabled: bool = dagger.field(default=True)
retries: int = dagger.field(default=3)
@dagger.function
def run(self) -> str:
...
```
Field names are kept distinct from generated method names and from the package
name used for decorators (`dagger`, `dgr`, or `dgmod`). That last guard matters
because Python evaluates class bodies sequentially: a class attribute named
`dgr` would shadow the imported `dgr` package alias for later
`@dgr.function` decorators or `dgr.field(...)` calls in the same class.
Verified with:
* `uv run ruff check tests/mod/_strategies.py`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
Broaden the property grammar to cover parameter defaults that reference
module-level constants. The analyzer has a static constant evaluator for
defaults, while runtime introspection sees the evaluated Python value directly.
The differential test should compare those paths for named constants, not only
inline literals.
The generator now emits primitive constants before aliases and may use a
matching constant as a parameter default. For example:
```python
CONST_RETRIES = 3
CONST_NAME = "demo"
@dagger.object_type
class Foo:
@dagger.function
def run(self, retries: int = CONST_RETRIES, name: str = CONST_NAME) -> str:
...
```
Constants are matched by their underlying primitive base so the generated
default remains type-compatible with the parameter annotation. List defaults are
still avoided for function parameters because `default_factory` only applies to
field declarations, not function signatures.
Verified with:
* `uv run ruff check tests/mod/_strategies.py`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
Broaden the property grammar to cover the quoted forms involved in alias
resolution. These are valid Python module patterns and runtime introspection
resolves them through `typing.get_type_hints`, so the AST analyzer needs to
match them.
There are two distinct quoted surfaces:
* A quoted alias leaf inside a larger annotation, such as
`Optional["AliasA"]`.
* A whole annotation rendered as one string, such as `"Optional[AliasA]"`.
For example, Hypothesis can now emit source like:
```python
from typing import Optional, TypeAlias
AliasA: TypeAlias = str
@dagger.object_type
class Foo:
@dagger.function
def greet(self, name: "Optional[AliasA]" = None) -> AliasA:
...
```
The generator also lets `Annotated` metadata strings reuse generated alias
names. That intentionally creates examples like
`Annotated[AliasA, Name("AliasA")]`, where the first `AliasA` is a type
position but the string `"AliasA"` is value metadata. This distinction is the
core parser boundary that the implementation fix must respect.
Verified with:
* `uv run ruff check tests/mod/_strategies.py`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
Add focused AST analyzer tests for the two alias expansion bugs that motivated
the parser fix. The property grammar can now generate these shapes, but these
fixtures make the exact contract obvious and keep the failure mode easy to
understand during review.
The first fixture uses:
```python
Alias: TypeAlias = str
def greet(self, name: Annotated[Alias, Name("Alias")]) -> str:
...
```
`Alias` in the first `Annotated` argument is a type position and should resolve
to `str`. The string `"Alias"` inside `Name(...)` is metadata and should remain
the exported API name.
The second fixture uses:
```python
Alias: TypeAlias = str
def greet(self, name: "Optional[Alias]" = None) -> str:
...
```
The analyzer should parse the whole-string annotation, expand `Alias` inside
the parsed `Optional[...]`, and resolve the parameter as optional `str`.
Before the parser fix, both fixtures fail by resolving as fallback object
types: one as unresolved `Alias`, the other as unresolved `Optional[Alias]`.
Verified before the parser fix:
* `uv run ruff check tests/mod/test_ast_analyzer.py`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
Print Hypothesis statistics in pytest output so property-test runs show how many generated module sources were exercised. This is useful for this stack because several commits expand the grammar, and `1 passed` only means one `@given` test function passed. For example, a thorough run reports: ```text 500 passing examples, 0 failing examples Stopped because settings.max_examples=500 ``` This changes test output only; it does not change the generated grammar or the analyzer. Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
The AST analyzer expands module-level type aliases before resolving function
arguments, fields, and return types. The previous implementation only handled
bare aliases and a small recursive case for pipe unions. That left valid
compound annotations such as `Optional[Alias]`, `list[Alias]`, quoted
annotations, and aliases nested under `Annotated` dependent on ad hoc cases.
Move alias expansion to a dedicated AST transformer that walks annotation type
positions and replaces each alias name with a deep copy of the alias target.
The transformer keeps the existing cross-file behavior by switching file
context when expanding an alias imported from another parsed file, and it keeps
cycle protection with a `(file, name)` seen set.
The important boundary is `Annotated`. In `Annotated[T, ...]`, only the first
argument is a type expression. Later arguments are metadata values. For example:
```python
Alias: TypeAlias = str
def greet(self, name: Annotated[Alias, Name("Alias")]) -> str:
...
```
The first `Alias` must expand to `str`, but the string `"Alias"` inside
`Name(...)` must stay unchanged because runtime introspection uses it as the
API name. The transformer now explicitly visits only the first `Annotated`
argument and stops at metadata calls.
The other reviewed failure involved whole-string annotations:
```python
Alias: TypeAlias = str
def greet(self, name: "Optional[Alias]" = None) -> str:
...
```
The analyzer parses the string into an AST and visits it. If a child is
expanded in place but the parsed root node remains the same object, returning
the original string constant discards the child replacement. The transformer now
returns the parsed AST after visiting string type positions, preserving
in-place child expansions for quoted compound annotations.
Alias targets are deep copied before recursive expansion so the cached
module-level alias map cannot be mutated across calls. Replacement nodes also
inherit the original source location to keep diagnostics anchored to the user's
annotation.
Verified with:
* `uv run ruff check src/dagger/mod/_analyzer/parser.py tests/mod/_strategies.py tests/mod/test_ast_analyzer.py`
* `uv run pytest tests/mod/test_ast_analyzer.py -k "type_alias"`
* `uv run pytest tests/mod/test_property.py --hypothesis-profile=thorough`
Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
The Hypothesis module-source grammar started generating aliases as quoted leaves inside larger annotations, for example Optional[list["AliasA"]]. That is a valid static-analysis case, and the AST analyzer should resolve AliasA to its module-level alias target. The differential property test uses typing.get_type_hints as its runtime oracle, though, and this shape is version-dependent. Python 3.10 leaves nested strings inside PEP 585 generics unresolved, so get_type_hints reports list["AliasA"]. Newer Python versions, including the default Python 3.14 runtime image, resolve the nested alias to list[str]. The analyzer was doing the desired static-analysis behavior, but the generated test case made the oracle disagree on Python 3.10. Keep alias leaves bare when they are inserted into larger generated annotations. Quoted alias coverage still comes from whole-string annotations such as "Optional[Alias]", which are stable across supported Python versions and still exercise the alias-expansion bug this stack fixes. This narrows the generated differential corpus to runtime-oracle-stable examples. It does not change production analyzer behavior: nested quoted aliases remain analyzable and should still resolve statically. Verified with: - uv run --python 3.10 pytest tests/mod/test_property.py::test_property_ast_matches_runtime --hypothesis-profile=thorough -q - uv run --python 3.14 pytest tests/mod/test_property.py::test_property_ast_matches_runtime --hypothesis-profile=thorough -q - uv run ruff check tests/mod/_strategies.py Signed-off-by: Guillaume de Rouville <guillaume@dagger.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR tightens the Python SDK AST module analyzer around type annotations, especially type aliases, quoted annotations, and
Annotated[...]metadata.This code runs during module type registration. When the engine asks the Python SDK runtime for a module's type definitions, the runtime analyzes the module's Python source files without importing the user module. The analyzer builds
ModuleMetadata, the SDK converts that into DaggerTypeDef/Function/ enum / field definitions throughdag.module(), and the engine loads the resultingModuleID.That source-analysis boundary matters because
typing.get_type_hintsworks from runtime objects. It requires importing the module, executing decorators and class bodies, and evaluating annotations. That is useful as a test oracle, but not as the production source of truth for static module analysis.Why Extra Coverage
Type checker conformance tests are useful, but they mostly test whether annotations are accepted, what type is inferred, or whether diagnostics are emitted. Dagger needs to answer a different question: what API schema should this source produce?
That means preserving
Annotated[...]metadata, resolving quoted forward references, and handling decorators, fields, defaults, imports, aliases, and constants correctly.I also kept a separate typing-conformance exploration branch that runs the analyzer against vendored type-checker corpora. It checked 827 annotations and found no real in-scope bugs:
typing_conformance: 517 checked, 214 OK, 300 oracle errors, 3 noisy divergencespyright_samples: 289 checked, 49 OK, 239 oracle errors, 1 out-of-scope divergencejedi_tests: 21 checked, 15 OK, 0 oracle errors, 6 harness-limit divergencesThat is useful confidence, but it also shows the limit of those suites: many cases cannot produce a runtime oracle, and the remaining divergences were harness or out-of-scope cases.
Alias Expansion Fix
Alias expansion now only rewrites the actual type annotation, not metadata attached to it.
Here, the annotation
Aliasshould resolve tostr, but the metadata string"Alias"is the Dagger API name and must stay unchanged. Previously, alias expansion could rewrite both.This PR also fixes quoted compound annotations:
That should resolve as
Optional[str]. Previously, the analyzer could expand the parsed annotation and then discard the expanded AST.Hypothesis Coverage
The Hypothesis module-source grammar was refactored to be easier to review and extend. It now covers aliases in simple, nested, quoted, and
Annotated[...]annotations, plus surrounding module patterns likeSelf, static methods, class methods, fields, import aliases, and module constants used as defaults.The generated tests compare static analyzer output against runtime introspection output. This does not prove there are no unknown bugs, but it gives broad coverage over the combinations where these bugs appeared.
Regression Tests
This PR adds focused regressions for:
Annotated[...]metadata strings that match alias names"Optional[Alias]"Both fail without the analyzer fix.
Verification
uv run ruff check src/dagger/mod/_analyzer/parser.py tests/mod/_strategies.py tests/mod/test_ast_analyzer.py uv run pytest tests/mod/test_ast_analyzer.py -k "type_alias" uv run pytest tests/mod/test_property.py --hypothesis-profile=thoroughThe thorough Hypothesis run covered 500 generated examples successfully.