Skip to content

Conversation

@da-woods
Copy link
Contributor

Draft for the moment because I think the probably a bit of cleanup left to be done until this is nice.

This moves almost all cdef global variables into the module state, which I think is one of the last big barriers to being able to use module state effectively.

The exception is cdef public variables which I don't think can be in the module state because they have to appear in a header file.

da-woods added 11 commits June 7, 2025 17:25
commit 59d932719d5c5fbdfc3da73b75bce1b1e82e1225
Merge: a139956 164df91
Author: da-woods <dw-git@d-woods.co.uk>
Date:   Sat Jun 7 15:34:43 2025 +0100

    Merge remote-tracking branch 'real_origin/master' into globals-module-state

commit a139956
Author: da-woods <dw-git@d-woods.co.uk>
Date:   Sat May 31 18:08:21 2025 +0100

    More working altough not there yet

commit f1abd7d
Merge: 8e51f9f 5fe4ea8
Author: da-woods <dw-git@d-woods.co.uk>
Date:   Sat May 31 13:38:00 2025 +0100

    Merge remote-tracking branch 'real_origin/master' into globals-module-state

commit 8e51f9f
Author: da-woods <dw-git@d-woods.co.uk>
Date:   Mon May 26 21:00:04 2025 +0100

    Begin adding globals to the module state

commit 9cec077
Author: da-woods <dw-git@d-woods.co.uk>
Date:   Mon May 26 18:19:11 2025 +0100

    Only pass "deprecated" when it'll be understood

commit 491434f
Author: da-woods <dw-git@d-woods.co.uk>
Date:   Mon May 26 16:40:13 2025 +0100

    Add back in documentation of forrmat

commit 66e708a
Merge: 90c203a a2e5e39
Author: da-woods <dw-git@d-woods.co.uk>
Date:   Mon May 26 16:38:31 2025 +0100

    Merge remote-tracking branch 'real_origin/master' into pick-limited-api-version-to-test

commit 90c203a
Author: da-woods <dw-git@d-woods.co.uk>
Date:   Mon May 26 14:57:02 2025 +0100

    Add an option to runtests to run with a specific Limited API version

    I don't propose using this on the CI (there's just too many
    combinations to test) but I think it's a useful thing to be
    able to do locally.
Comment on lines 2739 to 2743
def put_var_gotref(self, entry):
self.put_gotref(entry.cname, entry.type)
cname = entry.cname
if entry.is_declared_in_module_state():
cname = self.name_in_module_state(cname)
self.put_gotref(cname, entry.type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a bit of cleanup left

This code suggests to me that someone else needs to know how to build a "name in module state" than "self". (Not sure who, but at least "self" prevents the Entry from doing it itself.)

@da-woods da-woods marked this pull request as ready for review June 28, 2025 17:44
@da-woods da-woods mentioned this pull request Sep 16, 2025
Copy link
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there is a way to get the module state qualified cname into the Entry? ISTM that that might resolve a lot of quirks in the implementation.

Comment on lines 2526 to 2528
not self.entry.in_closure and not self.entry.from_closure):
return False
# Most likely a global, or a class attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider it risky to temporarily set object attributes in certain situations and rely on them to be there in other places. It can easily lead to incorrect C code being generated, even silently. Object state should be well defined when the object is created, not extended in an ad-hoc fashion. Static code analysers usually warn about this kind of state change, and for good reason.

if entry.is_declared_in_module_state():
storage_class = ""
dll_linkage = None
destination_code = globalstate['module_state']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that code might not be the best name for the code writer object, but it's a convention by now and so ubiquitous that it's good that it's short. But destination_code sounds like it's code for/about a destination, not something that allows writing code to a specific destination. I think I used decl_code in other places, which, in retrospect, doesn't sound much better but at least suggests what it should be used for. So, maybe that, or you can come up with something clearer?


if storage_class:
code.put("%s " % storage_class)
destination_code.put("%s " % storage_class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
destination_code.put("%s " % storage_class)
destination_code.put(f"{storage_class} ")

if init is not None:
code.put_safe(" = %s" % init)
code.putln(";")
destination_code.put_safe(" = %s" % init)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
destination_code.put_safe(" = %s" % init)
destination_code.put_safe(f" = {init}")

Comment on lines 7899 to +7906
if isinstance(self.loopvar_node, ExprNodes.TempNode):
self.loopvar_node.allocate(code)
elif isinstance(self.loopvar_node, ExprNodes.NameNode):
self.loopvar_node.generate_evaluation_code(code)
if isinstance(self.py_loopvar_node, ExprNodes.TempNode):
self.py_loopvar_node.allocate(code)
elif isinstance(self.py_loopvar_node, ExprNodes.NameNode):
self.py_loopvar_node.generate_evaluation_code(code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the generate_evaluation_code() only apply to NameNodes? Why do we need to special case them here?

Comment on lines -9740 to 9748

self.modified_entries.append((entry, entry.cname))
code.putln("%s = %s;" % (cname, entry.cname))
self.modified_entries.append((entry, entry.cname, entry.force_not_declared_in_module_state))
entry_cname = code.entry_cname_in_module_state(entry)
entry.force_not_declared_in_module_state = True
code.putln(f"{cname} = {entry_cname};")
entry.cname = cname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting. I wasn't aware of this 14 years old cname replacement hack. It's unfortunate, because as we see here, it now requires another hack to work around it.

What was the reason again why entries couldn't know their complete module state based cname themselves? That could avoid the need for a force_not_declared_in_module_state flag.

@da-woods
Copy link
Contributor Author

Do you think there is a way to get the module state qualified cname into the Entry? ISTM that that might resolve a lot of quirks in the implementation.

The problem with that is that the module state lookup should be different depending on where you're requesting it from (and not the scope that the entry lives in). For example, in the global scope it probably just has the module as a function argument while in a cdef class it might use PyType_GetModule.

Currently most of that isn't implemented and almost everything uses a generic fallback, but I think that should change in future.

One other option would be to have a single macro to access the module state and then redefine the macro at the start of every C function. I.e. move the work into the C compiler. I'm not convinced that's better, but it might simplify some of the code within Cython


(I'll look at the other comments later)

@scoder
Copy link
Contributor

scoder commented Sep 18, 2025

The problem with that is that the module state lookup should be different depending on where you're requesting it from (and not the scope that the entry lives in). For example, in the global scope it probably just has the module as a function argument while in a cdef class it might use PyType_GetModule.

So … can we give the Entry (or its scope?) all information that it needs in order to build the different access patterns, and then call different methods depending on which kind of access we need in a given situation?

@scoder
Copy link
Contributor

scoder commented Sep 18, 2025

Merging current master in order to update the CI config.

@da-woods
Copy link
Contributor Author

I think my current plans for this are:

  • I'll see if I can clean up the 14-year-old parallel node hack (as a separate PR).
  • I'll try to make a proof of concept version of the "generate the same C code every time and handle it with locally defined macros" - I think that might end up looking much nicer but it's hard to tell without trying it.

and it'll either be done or not before the rest of 3.2 and we just make a judgement based on that.

@da-woods
Copy link
Contributor Author

I think there's a GIL related issue I've missed: the module state lookup requires the GIL to be held (and we can't just use PyGILState_Ensure() for this, because that doesn't know which subinterpreter to ensure). That means that accessing most cdef globals needs the GIL in a module-state world.

I think that's probably a reasonable restriction (although it implies a lot of existing code won't run with module state).

But it probably suggests that this isn't really ready for 3.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants