-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-113190: Reenable non-debug interned string cleanup #113601
gh-113190: Reenable non-debug interned string cleanup #113601
Conversation
I added |
Thanks!! Will ping you once all the tests are passing to move this into safe-to-merge mode 🙂 |
@Eclips4 tests are passing! Could you help me by adding the "skip news" label (since it's probably not required) and removing the "DO-NOT-MERGE" label? Thanks!! |
cc @nascheme to take a look when you have a chance! |
It would be nice to do a better cleanup but I'm concerned about how safe this actually is. Since we don't have an accurate refcnt for these strings, we are assuming that no one is actually still using them. It seems possible that some extensions or programs embedding Python could hang on to interned strings after Maybe I'm being too paranoid? I don't know how we could actually fix this though. Keeping hold of and using memory allocated by the Python runtime after My gut feeling is we should proceed cautiously, even though we would like to fix the leak. Maybe we could enable this for alphas and then turn it off again for release? Maybe we need to wait until we get |
@nascheme I agree with what you mentioned above and I was paranoid of that scenario too, hence the disabling of the code. Purely going off from the docs, I don't think there's anywhere where Python provides an explicit expectation of what the behavior is of memory that survives a call to Py_Finalize. Furthermore, in the C-API it is specified that Perhaps we should be more explicit about this in the docs (maybe in the "Bugs and Caveats" section of Py_Finalize)? i.e python can't guarantee correctness for any memory allocated during a python initialization and used in any subsequent initializations. Thoughts? Regardless, I completely agree that we should still be careful about putting this out there. Stating with alphas sounds like a good idea to get some early signal. Let me know what's the best way to only enable this only for alphas. |
I think we are already there? Here's a quick test program that I ran:
|
It's mostly done for one year: https://mail.python.org/archives/list/[email protected]/thread/E4C6TDNVDPDNNP73HTGHN5W42LGAE22F/ "Mostly" means: some remaining stdlib C extensions still implement their own static types which are not cleared by
This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13:
In short, a program must not keep references to any Python object between In general, I suggest to reset all pointers to Python objects at each Py_Initialize() call. Or said differently: don't store any pointer to Python objects after Py_Finalize() :-) Examples of exceptions:
|
For sure, I can update the PR with the details with what you mentioned here!
Given this guarantee, I think it's safe to assume that if we have any Python object leak it's a user induced leak, right? i.e. the runtime itself will not cause incorrect behavior. I will include some wording around this so that users can isolate the surface are when looking for issues (i.e look at the third-party extensions not at the runtime).
That's my take too - though it's not specified in the docs! Any thoughts of also updating the Py_Finalize/Py_Initialize documentation as part of this PR to include stricter language like this? |
I looked again at the code. It has a complicated history:
If there are applications relying on interned strings to survive between multiple Py_Initialize()/Py_Finalize() cycles, they should already be affected by Python 3.10 and 3.11. By the way, Python 3.11 added _PyTypes_FiniTypes() to "clear" static types in Py_Finalize(). |
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Misc/NEWS.d/next/Core and Builtins/2024-01-15-18-11-48.gh-issue-113190.OwQX64.rst
Outdated
Show resolved
Hide resolved
@eduardo-elizondo Do you want to finish the docs here, or should I take over? |
@encukou updated and addressed the comments, this should be ready to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you! I'll run the refleak buildbots as there might have been regressions.
A style nit:
Thanks @eduardo-elizondo for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
Sorry, @eduardo-elizondo and @encukou, I could not cleanly backport this to
|
Sorry, @eduardo-elizondo and @encukou, I could not cleanly backport this to
|
Thanks @encukou !! |
Is there any chance this will be backported to 3.13.1? |
In GH-19474 we introduced immortalized interned strings. However, at the time, we didn't have a strict guarantee that a runtime shutdown (i.e
Py_Finalize()
) would completely cleanup all the allocated PyObjects (i.e a simple example of./python -X showrefcount -c 'import itertools'
would show leaks).Today, these leaks have been largely fixed and there have been stricter guarantees into how obmalloc and the interpreter state works after a runtime shutdown. Given this, it should be safe to re-enable the interned string cleanup and revert the structseq test that had issues with this leaks before. This test should show the correctness of this code now.