gh-113190: Reenable non-debug interned string cleanup #113601

eduardo-elizondo · 2023-12-31T18:26:37Z

In GH-19474 we introduced immortalized interned strings. However, at the time, we didn't have a strict guarantee that a runtime shutdown (i.e Py_Finalize()) would completely cleanup all the allocated PyObjects (i.e a simple example of ./python -X showrefcount -c 'import itertools' would show leaks).

Today, these leaks have been largely fixed and there have been stricter guarantees into how obmalloc and the interpreter state works after a runtime shutdown. Given this, it should be safe to re-enable the interned string cleanup and revert the structseq test that had issues with this leaks before. This test should show the correctness of this code now.

…terned

Eclips4 · 2023-12-31T18:51:46Z

I added DO-NOT-MERGE label for you :)

eduardo-elizondo · 2023-12-31T18:57:31Z

I added DO-NOT-MERGE label for you :)

Thanks!! Will ping you once all the tests are passing to move this into safe-to-merge mode 🙂

eduardo-elizondo · 2023-12-31T19:14:09Z

@Eclips4 tests are passing! Could you help me by adding the "skip news" label (since it's probably not required) and removing the "DO-NOT-MERGE" label?

Thanks!!

eduardo-elizondo · 2024-01-04T15:33:49Z

cc @nascheme to take a look when you have a chance!

nascheme · 2024-01-04T19:56:27Z

It would be nice to do a better cleanup but I'm concerned about how safe this actually is. Since we don't have an accurate refcnt for these strings, we are assuming that no one is actually still using them. It seems possible that some extensions or programs embedding Python could hang on to interned strings after Py_Finalize(). That would risk introducing "use-after-free" bugs. In the worst case, that could turn into an arbitrary code execution security bug.

Maybe I'm being too paranoid? I don't know how we could actually fix this though. Keeping hold of and using memory allocated by the Python runtime after Py_Finalize() surely puts you an very shaky ground. Previously, the obmalloc state was global to the process and you could keep using memory allocated from there after finalizing. With sub-interpeters potentially having their own obmalloc state, that's likely not okay.

My gut feeling is we should proceed cautiously, even though we would like to fix the leak. Maybe we could enable this for alphas and then turn it off again for release? Maybe we need to wait until we get python -X showrefcount showing zero for all or nearly all built-in and well known extensions? Seems like a tall order though.

eduardo-elizondo · 2024-01-05T15:03:06Z

@nascheme I agree with what you mentioned above and I was paranoid of that scenario too, hence the disabling of the code.

Purely going off from the docs, I don't think there's anywhere where Python provides an explicit expectation of what the behavior is of memory that survives a call to Py_Finalize. Furthermore, in the C-API it is specified that Py_Finalize will: "Undo all initializations made by Py_Initialize() and subsequent use of Python/C API functions". This gives an an implicit assumption that holding onto these interned strings after Py_Finalize is an undefined behavior.

Perhaps we should be more explicit about this in the docs (maybe in the "Bugs and Caveats" section of Py_Finalize)? i.e python can't guarantee correctness for any memory allocated during a python initialization and used in any subsequent initializations. Thoughts?

Regardless, I completely agree that we should still be careful about putting this out there. Stating with alphas sounds like a good idea to get some early signal. Let me know what's the best way to only enable this only for alphas.

eduardo-elizondo · 2024-01-05T15:04:16Z

Maybe we need to wait until we get python -X showrefcount showing zero for all or nearly all built-in and well known extensions?

I think we are already there? Here's a quick test program that I ran:

./python -X showrefcount -c "
import sys
import importlib
for mod in sys.modules.keys():
  importlib.import_module(mod)
"
[0 refs, 0 blocks]

vstinner · 2024-01-08T13:21:33Z

Maybe we need to wait until we get python -X showrefcount showing zero for all or nearly all built-in and well known extensions?

It's mostly done for one year: https://mail.python.org/archives/list/[email protected]/thread/E4C6TDNVDPDNNP73HTGHN5W42LGAE22F/ "Mostly" means: some remaining stdlib C extensions still implement their own static types which are not cleared by _PyTypes_FiniTypes(), and/or don't use the multi-phase initialization API yet.

test_embed checks that python -c pass does not leak memory at Python exit: see MiscTests.test_no_memleak().

Purely going off from the docs, I don't think there's anywhere where Python provides an explicit expectation of what the behavior is of memory that survives a call to Py_Finalize.

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13:

Explain how to qualify if a crash is related to this issue,
Document how to update affected code.

In short, a program must not keep references to any Python object between Py_Finalize() and Py_Initialize(). IMO the general rule is calling Py_Finalize() converts all Python object pointers to dangling pointers. There are some very specific exceptions which should be listed with conditions, such as "only on CPython and it's an implementation detail which can change between two Python versions".

In general, I suggest to reset all pointers to Python objects at each Py_Initialize() call. Or said differently: don't store any pointer to Python objects after Py_Finalize() :-)

Examples of exceptions:

Immortal objects
Static types (built-in immortal objects are marked as immortal)
Singletons (built-in singleton objects are marked as immortal) such as: empty string, empty tuple, small integers (-5..256).

eduardo-elizondo · 2024-01-08T15:00:25Z

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13

For sure, I can update the PR with the details with what you mentioned here!

It's mostly done for one year

Given this guarantee, I think it's safe to assume that if we have any Python object leak it's a user induced leak, right? i.e. the runtime itself will not cause incorrect behavior. I will include some wording around this so that users can isolate the surface are when looking for issues (i.e look at the third-party extensions not at the runtime).

In short, a program must not keep references to any Python object between Py_Finalize() and Py_Initialize()

That's my take too - though it's not specified in the docs! Any thoughts of also updating the Py_Finalize/Py_Initialize documentation as part of this PR to include stricter language like this?

vstinner · 2024-01-08T18:10:04Z

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13

I looked again at the code. It has a complicated history:

In Python 3.9 and older, interned strings were never deleted. Only special build for Valgrind and Purity cleared them at exit.
In Python 3.10 and 3.11, interned strings are always deleted at exit.
In Python 3.12, interned strings are deleted only if Python is built in debug mode: they are not deleted in release mode.

If there are applications relying on interned strings to survive between multiple Py_Initialize()/Py_Finalize() cycles, they should already be affected by Python 3.10 and 3.11.

By the way, Python 3.11 added _PyTypes_FiniTypes() to "clear" static types in Py_Finalize().

…ear_interned

bedevere-app · 2024-06-19T12:36:05Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Doc/whatsnew/3.14.rst

Misc/NEWS.d/next/Core and Builtins/2024-01-15-18-11-48.gh-issue-113190.OwQX64.rst

The conflicting PR is merged.

encukou · 2024-08-02T10:05:50Z

@eduardo-elizondo Do you want to finish the docs here, or should I take over?

…ear_interned

cpython-cla-bot · 2024-08-11T15:32:31Z

All commit authors signed the Contributor License Agreement.

eduardo-elizondo · 2024-08-11T16:03:20Z

@encukou updated and addressed the comments, this should be ready to go!

encukou

Looks great, thank you! I'll run the refleak buildbots as there might have been regressions.

A style nit:

Doc/whatsnew/3.14.rst

bedevere-bot · 2024-08-12T10:06:22Z

🤖 New build scheduled with the buildbot fleet by @encukou for commit 6aee7f5 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

miss-islington-app · 2024-08-15T11:55:13Z

Thanks @eduardo-elizondo for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington-app · 2024-08-15T11:55:16Z

Sorry, @eduardo-elizondo and @encukou, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 3203a7412977b8da3aba2770308136a37f48c927 3.13

miss-islington-app · 2024-08-15T11:55:29Z

Sorry, @eduardo-elizondo and @encukou, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 3203a7412977b8da3aba2770308136a37f48c927 3.12

eduardo-elizondo · 2024-08-16T13:23:15Z

Thanks @encukou !!

…-113601)

neonene · 2024-10-14T22:51:23Z

Is there any chance this will be backported to 3.13.1?

Reenable non-debug interned string cleanup through _PyUnicode_ClearIn…

666ca33

…terned

bedevere-app bot added the awaiting review label Dec 31, 2023

bedevere-app bot mentioned this pull request Dec 31, 2023

python3.12 introduces numerous memory leaks (as reported by ASAN) #113190

Open

Revert test changes due to previously leftover leaks

23d1aff

Eclips4 changed the title ~~[DONOTMETGE] gh-113190: Reenable non-debug interned string cleanup~~ [DO NOT MERGE] gh-113190: Reenable non-debug interned string cleanup Dec 31, 2023

Eclips4 added the DO-NOT-MERGE label Dec 31, 2023

Eclips4 changed the title ~~[DO NOT MERGE] gh-113190: Reenable non-debug interned string cleanup~~ gh-113190: Reenable non-debug interned string cleanup Dec 31, 2023

eduardo-elizondo mentioned this pull request Dec 31, 2023

Memory leak on executables embedded with 3.13 #113055

Open

Eclips4 added skip news interpreter-core (Objects, Python, Grammar, and Parser dirs) needs backport to 3.12 bug and security fixes and removed DO-NOT-MERGE labels Dec 31, 2023

eduardo-elizondo added 2 commits January 15, 2024 17:37

Merge remote-tracking branch 'upstream/main' into reenable_unicode_cl…

fd9c7ff

…ear_interned

Add documentation

ed09466

erlend-aasland mentioned this pull request Jan 15, 2024

Interned strings are immortal, despite what the documentation says #113993

Closed

eduardo-elizondo added 5 commits January 15, 2024 18:35

Lint docs

2608119

Lint docs v2

17e8817

Lint docs v3

dd1e8b4

Lint docs v4

ce5536b

Lint docs v5

071438f

encukou added 2 commits July 19, 2024 13:39

Merge in the main branch

085b93e

Remove outdated comment

080a853

encukou reviewed Jul 19, 2024

View reviewed changes

Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved

Misc/NEWS.d/next/Core and Builtins/2024-01-15-18-11-48.gh-issue-113190.OwQX64.rst Outdated Show resolved Hide resolved

bedevere-app bot added awaiting review and removed awaiting changes labels Jul 19, 2024

Merge remote-tracking branch 'upstream/main' into reenable_unicode_cl…

8484c4b

…ear_interned

Addressed Comments

6aee7f5

encukou approved these changes Aug 12, 2024

View reviewed changes

Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting review labels Aug 12, 2024

encukou added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Aug 12, 2024

bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Aug 12, 2024

Update whatsnew

dc3b54f

encukou enabled auto-merge (squash) August 15, 2024 11:28

encukou merged commit 3203a74 into python:main Aug 15, 2024
37 checks passed

bedevere-app bot removed the awaiting merge label Aug 15, 2024

miss-islington-app bot assigned encukou Aug 15, 2024

blhsing pushed a commit to blhsing/cpython that referenced this pull request Aug 22, 2024

pythongh-113190: Reenable non-debug interned string cleanup (pythonGH…

b32350a

…-113601)

sobolevn mentioned this pull request Sep 9, 2024

gh-108303: Remove the non-test Lib/test/reperf.py #114356

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-113190: Reenable non-debug interned string cleanup #113601

gh-113190: Reenable non-debug interned string cleanup #113601

eduardo-elizondo commented Dec 31, 2023 •

edited

Loading

Eclips4 commented Dec 31, 2023 •

edited

Loading

eduardo-elizondo commented Dec 31, 2023 •

edited

Loading

eduardo-elizondo commented Dec 31, 2023 •

edited

Loading

eduardo-elizondo commented Jan 4, 2024

nascheme commented Jan 4, 2024

eduardo-elizondo commented Jan 5, 2024 •

edited

Loading

eduardo-elizondo commented Jan 5, 2024 •

edited

Loading

vstinner commented Jan 8, 2024 •

edited

Loading

eduardo-elizondo commented Jan 8, 2024 •

edited by vstinner

Loading

vstinner commented Jan 8, 2024

bedevere-app bot commented Jun 19, 2024

encukou commented Aug 2, 2024

cpython-cla-bot bot commented Aug 11, 2024 •

edited

Loading

eduardo-elizondo commented Aug 11, 2024

encukou left a comment

bedevere-bot commented Aug 12, 2024

miss-islington-app bot commented Aug 15, 2024

miss-islington-app bot commented Aug 15, 2024

miss-islington-app bot commented Aug 15, 2024

eduardo-elizondo commented Aug 16, 2024

neonene commented Oct 14, 2024

gh-113190: Reenable non-debug interned string cleanup #113601

gh-113190: Reenable non-debug interned string cleanup #113601

Conversation

eduardo-elizondo commented Dec 31, 2023 • edited Loading

Eclips4 commented Dec 31, 2023 • edited Loading

eduardo-elizondo commented Dec 31, 2023 • edited Loading

eduardo-elizondo commented Dec 31, 2023 • edited Loading

eduardo-elizondo commented Jan 4, 2024

nascheme commented Jan 4, 2024

eduardo-elizondo commented Jan 5, 2024 • edited Loading

eduardo-elizondo commented Jan 5, 2024 • edited Loading

vstinner commented Jan 8, 2024 • edited Loading

eduardo-elizondo commented Jan 8, 2024 • edited by vstinner Loading

vstinner commented Jan 8, 2024

bedevere-app bot commented Jun 19, 2024

encukou commented Aug 2, 2024

cpython-cla-bot bot commented Aug 11, 2024 • edited Loading

eduardo-elizondo commented Aug 11, 2024

encukou left a comment

Choose a reason for hiding this comment

bedevere-bot commented Aug 12, 2024

miss-islington-app bot commented Aug 15, 2024

miss-islington-app bot commented Aug 15, 2024

miss-islington-app bot commented Aug 15, 2024

eduardo-elizondo commented Aug 16, 2024

neonene commented Oct 14, 2024

eduardo-elizondo commented Dec 31, 2023 •

edited

Loading

Eclips4 commented Dec 31, 2023 •

edited

Loading

eduardo-elizondo commented Dec 31, 2023 •

edited

Loading

eduardo-elizondo commented Dec 31, 2023 •

edited

Loading

eduardo-elizondo commented Jan 5, 2024 •

edited

Loading

eduardo-elizondo commented Jan 5, 2024 •

edited

Loading

vstinner commented Jan 8, 2024 •

edited

Loading

eduardo-elizondo commented Jan 8, 2024 •

edited by vstinner

Loading

cpython-cla-bot bot commented Aug 11, 2024 •

edited

Loading