gh-84436: Implement Immortal Objects #19474

eduardo-elizondo · 2020-04-11T16:51:06Z

This is the implementation of PEP683

Motivation

The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.

Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.

https://bugs.python.org/issue40255

Issue: Fixing Copy on Writes from reference counting and immortal objects #84436

eduardo-elizondo · 2020-04-11T20:22:32Z

This is ready to review, the CI is finally green. Really no idea why the newly added GC tests are failing on Windows and unfortunately I don't have a Windows machine to debug this.

eduardo-elizondo · 2020-04-11T20:23:49Z

Looping in @carljm and @DinoV who have pointed out some of the issues with immortal instances in the permanent generation participating in a GC collection (i.e dicts). Let me know if you have some other thoughts or ideas on this!

eduardo-elizondo · 2020-04-11T20:24:58Z

Also looping in @vstinner. Finally got around upstreaming this patch since you recently wrote about this on your C-API Improvement Docs

Include/object.h

Modules/gcmodule.c

Include/object.h

Lib/test/test_gc.py

nascheme · 2020-04-14T18:57:46Z

My first reaction is that this shouldn't become part of the default build because most Python users will not make use of it and then it becomes pure extra overhead. However, I know for some people that it is a useful feature (e.g. pre-fork server architecture that exploits copy-on-write OS memory management). I would use it myself since I write web applications with that style.

Would it be okay to make this a compile time option, disabled by default? I think in general it is a bad idea to have too many of those types of build options. It makes code maintenance and testing more difficult. Some example build variations from the past that caused issues: thread/no-threads, Unicode width, various debug options (@vstinner removed some of those). So, I'm not super excited about introducing a new build option.

Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow? A couple years ago I did a "tagged pointer" experiment that used a similar bit. In that case, small integers became one machine word in size and also become immortal.

Another thought: when you did your testing, were any objects made immortal? I would imagine that, by default, you could make everything immortal after initial interpreter startup. You are paying for an extra test+branch in INCREF and DECREF but for many objects (e.g. None, True, False, types) you avoid dirtying the memory/cache with writes to the reference count.

eduardo-elizondo · 2020-04-14T19:28:34Z

@nascheme you should definitely join the conversation happening in the bug report of this PR https://bugs.python.org/issue40255

However, I know for some people that it is a useful feature

Exactly, this change might be a feature for CPython power users

Would it be okay to make this a compile time option, disabled by default?

Yeah, that's probably the best option. That's also the consensus in the bug report thread (if the change is approved)

I think in general it is a bad idea to have too many of those types of build options.

Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).

Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow?

We can indeed, I think somebody also mentioned that in the bug report. A potentially good place could be main.c:pymain_main right after pymain_main. Let me explore that and push that change if it looks like performance a improvement!

In theory we could optimize even further to reduce the perf cost. By leveraging saturated adds and conditional moves we could remove the branching instruction. I haven't explored this further since the current PR was good enough. Personally, I favor the current PR, but this could be changed to:

/* Branch-less incref saturated at PY_SSIZE_T_MAX */
#define _Py_INC_REF(op) ({
    __asm__ (
        "addq $0x1, %[refcnt]"
        "cmovoq  %[refcnt_max], %[refcnt]"
        : [refcnt] "+r" (((PyObject *)op)->ob_refcnt)
        : [refcnt_max] "r" (PY_SSIZE_T_MAX)
    );})

/* Branch-less decref saturated at PY_SSIZE_T_MAX */
#define _Py_DEC_REF(op) ({
    Py_ssize_t tmp = ((PyObject *)op)->ob_refcnt;
    __asm__ (
        "subq $0x1, %[refcnt]"
        "addq $0x1, %[tmp]"
        "cmovoq  %[refcnt_max], %[refcnt]"
        : [refcnt] "+r" (((PyObject *)op)->ob_refcnt), [tmp] "+r" (tmp)
        : [refcnt_max] "r" (PY_SSIZE_T_MAX)
    );})

pablogsal · 2020-04-14T20:03:37Z

Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).

Not only that, we would need specialized buildbots to test the code base with this option activated in a bunch of supported platforms and that raises the maintainance costs.

vstinner

This feature sounds controversial, so I block it until a consensus can be reached.

bedevere-bot · 2020-04-14T21:48:43Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

eduardo-elizondo · 2023-04-25T22:20:45Z

Three years in the making and it's finally there! Huge shoutouts to @ericsnowcurrently for working together with me throughout the years to get this all the way through, you rock!!

Also, thanks @markshannon @gvanrossum and @pablogsal for being a sounding board for ideas, reviews, and coaching on the messaging of the PR / PEP!

ydshieh · 2023-04-26T05:37:01Z

Congrats @eduardo-elizondo! 🔥

I followed this PR from the beginning, even needed to port the changes (in 2020) into python 3.9 due to the memory issue with multiprocessing for my previous company.

At some point, I felt this was not going to be in Python due to the inactivity here (especially the review).
It's incredible and amazing that you continued to finalize this work, and finally got it merged 🎉 !

This was missed in gh-19474. It matters for with a per-interpreter GIL since PyDictKeysObject.dk_refcnt breaks isolation and leads to races.

eendebakpt · 2023-10-28T19:31:08Z

@ericsnowcurrently @eduardo-elizondo One of the acceptance conditions for pep 683 was to update the pep with final benchmark results. It appears that has not been done. Are the numbers available somewhere? I want to determine whether the performance regressions reported in #109049 are due to this PR (or perhaps other PRs as well).

In discourse: pep-683-immortal-objects-using-a-fixed-refcount-round-4-last-call there is some discussion on whether to add the performance numbers or not. After that message the pep has been updated, but not the performance numbers.

ericsnowcurrently · 2023-10-30T15:25:42Z

@eduardo-elizondo, do you have time to update the PEP with final benchmark results for ea2c001?

arigo · 2023-11-01T12:01:35Z

Note: https://speed.python.org/ shows an important performance regression in maybe 1/3rd of all benchmarks, dated Apr 22, which is the date this PR was merged. The most significant I've seen so far (which, if reproduced locally, could help figure out the root cause) is unpack_sequence.

gvanrossum · 2023-11-01T15:37:43Z

@ericsnowcurrently @mdboom ^^

eduardo-elizondo · 2023-11-03T03:32:07Z

@ericsnowcurrently we actually already have the benchmark numbers here: #19474 (comment) which I ran right before merging and there's only test/lint fixes on top of that. This shows roughly a ~1.02x geometric mean regression (~1.03x on MSVC). Let me know if this is what we are looking for!

To be clear - there are will be both slower and faster benchmarks. However, we should focus on the geometric mean (rather than a single benchmark) which is our best proxy when benchmarks move in both ways. Separately, performance measurements can come up with wildly different results on different environments. For these experiments I used gcc-11.1 and MSVC v14.33 on 'lab-like' bare-metal machines which resulted in consistently reproducible results.

ericsnowcurrently · 2023-11-03T18:20:29Z

python/peps#3519

arigo · 2023-11-03T19:43:54Z

There is a discrepancy between the text in PEP 683 and the actual implementation, sections Accidental Immortality and Accidental De-Immortalizing, on 64-bit machines. If it is already mentioned somewhere, sorry about that; but it might be useful to mention that inside the PEP itself. The problem is that, unless I'm missing something, the PEP says that 64-bit machines don't have any accidental immortality problems in practice because a 64-bit or close-to-64-bit refcount never overflows; but the implementation instead starts to consider objects immortal as soon as their refcount reaches 31 bits, a much more reachable value on 64-bit machines. For example, a 2**31-entries numpy object array can be initialized with copies of the same object---this only takes 16 GB of RAM.

The issues listed in Accidental De-Immortalizing are particularly problematic in this situation. I have not tested it, but looking at the source code it seems that this kind of code would crash CPython on 64-bit machines:

make a non-immortal object 'a'.
store at least 2**31 copies of 'a' inside a data structure maintained by an older stable-ABI extension module.
store 2**31 more copies of 'a' inside a list (e.g. by calling the list() function over that data structure; at that point the total memory usage is 32-33 GB of RAM, a very reachable amount). The refcount reaches 2**32-1 and there is at least one incref beyond that that is dropped.
deallocate the data structure; as this is done in the older extension module, this decrefs 'a' to 2**31-1 and the object looses its immortal status.
deallocate the list. This will cause the refcount to be decremented 2**31 times from its previous value of 2**31-1, and crash.

Again, this is not a real bug report, because it seems that this kind of issue was considered and is supposedly passed as an acceptable trade-off for using stable-ABI modules compiled with older versions of CPython. This is more a missing documentation issue.

arigo · 2023-11-06T12:22:23Z

Note: a potential fix for this issue would be to change Py_INCREF on 64-bit platforms: instead of checking if (uint)refcount == 2**32-1, it could check if (Py_ssize_t)refcount < 0. This would remove any risk of the crash described above because no INCREF would ever skip the refcount increment before bit 63 is set. Only the more acceptable very rare leak might occur, once bit 31 is set.

The value to initialize immortal objects with would be something like 0xC0000000C0000000, which is in the middle of both 32-bit and 64-bit ranges. It takes 0x40000000 increfs or decrefs by an old stable-ABI module to accidentally make the CPython core change the refcount again (but with no risk of accidentally freeing the object because the refcount is huge).

ericsnowcurrently · 2023-11-06T16:36:26Z

CC @eduardo-elizondo

arigo · 2023-11-08T08:15:52Z

...or, change _Py_IsImmortal to (Py_ssize_t)refcount<0 on 64-bit, and then just use _Py_IsImmortal in both INCREF and DECREF on both 32- and 64-bit platforms, and be done with it? This should Always Just Work(tm) if we reasonably assume that it's completely impossible to repeat a loop 2**62 times, and initialize the immortal refcount to 2**63+2**62.

(Here I'm working with the implicit never-documented assumption that people first tried to use the 32-bit code directly on 64-bit, with the immortal value 0x3fffffffffffffff, but found that it has some performance impact on Intel. That would be because a constant value that doesn't fit 32 bits does indeed have a cost. That's why I'm suggesting here to use (Py_ssize_t)refcount < 0, which is simpler and might be even cheaper than the current 32- and 64-bit mix of arithmetic on the same refcount.)

ericsnowcurrently · 2023-11-08T14:54:55Z

@arigo, thanks for the feedback, both about speed.python.org and about accidental de-immortalization. @eduardo-elizondo has the insight we need (which I don't) in both cases, so I'll defer to what he has to say. @markshannon may be have some thoughts as well, at least about the refcount corner case.

eduardo-elizondo · 2023-12-31T16:27:42Z

@ariago Thanks for the reply, just went through it in detail. First of all, pretty much all of what you are saying in the first message is correct, though there are some additional details that would help complement what you wrote above. Let me try to reply to your message by breaking it down into what I believe are the main questions:

but the implementation instead starts to consider objects immortal as soon as their refcount reaches 31 bits, a much more reachable value on 64-bit machines

Overall, yes, the PEP talks about a very high value which is what we wanted to reflect there and left the value to be an implementation detail. Over here, we ended up using a saturated 32-bits because it provides a wide number of benefits:

Performance: By using the lower 32-bits we help the compiler generate better code by directly manipulating the 32-bit registers (from the 64-bit refcount register). This gave a considerable perf improvement (around 1-2% geometric mean).
Pointer Tagging: By leaving the upper 32 bits free, we could potentially se them for tagging the pointers to improve performance in future iterations.
NoGIL refcount: The reference implementation of PEP703 also exploits this by using the unused bits to keep track of the biased reference counts.
Thus, this is why we decided to stick with 32-bit saturation for now which we could always revise in later versions if needed!

I have not tested it, but looking at the source code it seems that this kind of code would crash CPython on 64-bit machines

No need to test it, it will play out exactly as you mentioned! For this one, Eric and I talked about this exact scenario and even surfaced this during a language summit to the core-devs. However, we believe that it’s a very contrived example as it would require a large amount of asymmetric decrefs. In reality, what happens with very large objects such as these is that they either live throughout the entire execution of the application or there’s a combination of symmetric increfs and decrefs that prevent this from happening.

I did end up testing this in a very large machine with a large application where 16GB is relatively small with hundreds of older stable-ABI modules and didn’t see this issue materialize. Of course, this is just a single application but I was indeed on the lookout to make sure that this precise scenario was not prevalent. The good thing for this one though is that as we go into newer python versions and we keep updating our C-Extensions the risk here will become less and less prevalent.

a potential fix for this issue would be to change Py_INCREF on 64-bit platforms: instead of checking if (uint)refcount == 2**32-1, it could check if (Py_ssize_t)refcount < 0

At some point I did indeed try a solution similar to this, however, this ended up being in conflict with some of the refcount manipulations that the GC does on the two most significant bits and would ignore the fact that an object is immortal causing incorrect behavior. Not only that but also, for the reasons mentioned above, we wanted to keep the entire refcount arithmetic with just 32-bits.

A considered alternative (but never implemented) solution would be to make the GC immortal object aware and then go for the 64-bit solution, but this would required a bit more work on the gcmodule not to mention the added complexity in the module. There might be a simple solution there that I never figured out! However, this would still imply that we would all the bits that we now freed for future use cases. Given the acceptance of 703 we might as well keep refcounts as 32-bits.

...or, change _Py_IsImmortal to (Py_ssize_t)refcount<0 on 64-bit, and then just use _Py_IsImmortal in both INCREF and DECREF on both 32- and 64-bit platforms, and be done with it?

Using all 64-bits causes the issue that I pointed above with the GC. But we could indeed use this check for the lower 32 bits (which we already do in decref). This was actually the original implementation and as you mentioned, this is what we tried first. The reason we did a more specialized check in incref is just due to improved perf.

eduardo-elizondo · 2023-12-31T16:35:53Z

@ariago Overall though these are great questions and I'll be happy to update and revise the wording of the PEP if you think it'll be useful! Let me know what you think! 🙂

cfbolz · 2024-01-01T18:24:28Z

pinging @arigo because there was a typo in the account name in the most recent messages

arigo · 2024-01-01T21:43:12Z

Adding a known, rare, hard-to-debug crash in CPython would be worth 1-2% of performance? I personally disagree with that but I haven't been a CPython contributor for many years now, so I have nothing to add.

markshannon · 2024-04-09T09:16:12Z

Include/object.h

+#define PyObject_HEAD_INIT(type) \
+    {                            \
+        _PyObject_EXTRA_INIT     \
+        { 1 },                   \


This seems incorrect. A statically allocated object is immortal, regardless of whether it is in the core or not.

Yes, you are right! I even thought about making this the default behavior.

However, there could be cases in extension code with asserts/tests on exact refcount values. This would break those builds/tests. You can even see it in this PR where I had to change _testembed.c to modify the Py_REFCNT(str1) == 1 check to _Py_IsImmortal(str1)

There's an argument on whether or not having any exact refcounts checks are correct, but I didn't want to challenge it at the time. Thus, to avoid any problems with extension code and reduce the surface area of impact, I localized it to just affect the CPython build.

Implement Immortal Instances

0c930b7

eduardo-elizondo requested a review from pablogsal as a code owner April 11, 2020 16:51

the-knights-who-say-ni added the CLA signed label Apr 11, 2020

bedevere-bot added the awaiting review label Apr 11, 2020

eduardo-elizondo added 9 commits April 11, 2020 09:52

Nits

7005944

Bypass immortality in NewReference

c6a1bfa

Add News and Fix MSVC Build

51e4879

Formatting Nits

cc2ece3

Typo

72d12fa

MSVC Test

f04776e

Skip test for MSVC

fa8d668

Skip test for MSVC 32 & 64

f066633

Skip all tests for Windows

36e0a9a

eduardo-elizondo changed the title ~~[WIP] bpo-40255: Implement Immortal Instances~~ bpo-40255: Implement Immortal Instances Apr 11, 2020

nascheme reviewed Apr 14, 2020

View reviewed changes

Include/object.h Outdated Show resolved Hide resolved

Include/object.h Outdated Show resolved Hide resolved

Modules/gcmodule.c Outdated Show resolved Hide resolved

pablogsal reviewed Apr 14, 2020

View reviewed changes

Modules/gcmodule.c Outdated Show resolved Hide resolved

pablogsal reviewed Apr 14, 2020

View reviewed changes

Include/object.h Outdated Show resolved Hide resolved

pablogsal reviewed Apr 14, 2020

View reviewed changes

Lib/test/test_gc.py Outdated Show resolved Hide resolved

vstinner previously requested changes Apr 14, 2020

View reviewed changes

bedevere-bot removed the awaiting review label Apr 14, 2020

bedevere-bot added the awaiting changes label Apr 14, 2020

Immortalize known immortals

2f9fa29

eduardo-elizondo requested a review from 1st1 as a code owner April 15, 2020 19:19

ericsnowcurrently mentioned this pull request May 6, 2023

gh-104252: Immortalize Py_EMPTY_KEYS #104253

Merged

ericsnowcurrently added a commit that referenced this pull request May 10, 2023

gh-104252: Immortalize Py_EMPTY_KEYS (gh-104253)

b8f7ab5

This was missed in gh-19474. It matters for with a per-interpreter GIL since PyDictKeysObject.dk_refcnt breaks isolation and leads to races.

mdboom mentioned this pull request Jun 7, 2023

Compare a matrix of nogil to other upstreams faster-cpython/ideas#597

Open

sunmy2019 mentioned this pull request Jul 7, 2023

dict can have the same key twice #106507

Closed

Eclips4 mentioned this pull request Aug 14, 2023

Unnecessary comment about increasing the reference count in usage of Py_None #107955

Closed

eendebakpt mentioned this pull request Sep 19, 2023

Substantial Performance Regression of Dict operations in Python 3.12.0rc1 versus Python 3.11.4 #109049

Open

Mause mentioned this pull request Sep 27, 2023

[PythonDev] Don't dereference None when creating pandas dataframe duckdb/duckdb#9127

Merged

nascheme mentioned this pull request Dec 29, 2023

Memory leak on executables embedded with 3.13 #113055

Open

JukkaL mentioned this pull request Dec 30, 2023

Optionally avoid incref/decref immortality checks on 3.12+ mypyc/mypyc#1044

Open

eduardo-elizondo mentioned this pull request Dec 31, 2023

gh-113190: Reenable non-debug interned string cleanup #113601

Merged

encukou mentioned this pull request Jan 12, 2024

Interned strings are immortal, despite what the documentation says #113993

Closed

markshannon reviewed Apr 9, 2024

View reviewed changes

markshannon mentioned this pull request Apr 9, 2024

GH-115776: Embed the values array into the object, for "normal" Python objects. #116115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-84436: Implement Immortal Objects #19474

gh-84436: Implement Immortal Objects #19474

eduardo-elizondo commented Apr 11, 2020 •

edited

Loading

eduardo-elizondo commented Apr 11, 2020 •

edited

Loading

eduardo-elizondo commented Apr 11, 2020

eduardo-elizondo commented Apr 11, 2020 •

edited

Loading

nascheme commented Apr 14, 2020

eduardo-elizondo commented Apr 14, 2020 •

edited

Loading

pablogsal commented Apr 14, 2020 •

edited

Loading

vstinner left a comment

bedevere-bot commented Apr 14, 2020

eduardo-elizondo commented Apr 25, 2023 •

edited

Loading

ydshieh commented Apr 26, 2023

eendebakpt commented Oct 28, 2023

ericsnowcurrently commented Oct 30, 2023

arigo commented Nov 1, 2023

gvanrossum commented Nov 1, 2023

eduardo-elizondo commented Nov 3, 2023 •

edited

Loading

ericsnowcurrently commented Nov 3, 2023

arigo commented Nov 3, 2023 •

edited

Loading

arigo commented Nov 6, 2023

ericsnowcurrently commented Nov 6, 2023

arigo commented Nov 8, 2023 •

edited

Loading

ericsnowcurrently commented Nov 8, 2023

eduardo-elizondo commented Dec 31, 2023

eduardo-elizondo commented Dec 31, 2023

cfbolz commented Jan 1, 2024

arigo commented Jan 1, 2024

markshannon Apr 9, 2024

eduardo-elizondo Apr 21, 2024 •

edited

Loading

gh-84436: Implement Immortal Objects #19474

gh-84436: Implement Immortal Objects #19474

Conversation

eduardo-elizondo commented Apr 11, 2020 • edited Loading

Motivation

eduardo-elizondo commented Apr 11, 2020 • edited Loading

eduardo-elizondo commented Apr 11, 2020

eduardo-elizondo commented Apr 11, 2020 • edited Loading

nascheme commented Apr 14, 2020

eduardo-elizondo commented Apr 14, 2020 • edited Loading

pablogsal commented Apr 14, 2020 • edited Loading

vstinner left a comment

Choose a reason for hiding this comment

bedevere-bot commented Apr 14, 2020

eduardo-elizondo commented Apr 25, 2023 • edited Loading

ydshieh commented Apr 26, 2023

eendebakpt commented Oct 28, 2023

ericsnowcurrently commented Oct 30, 2023

arigo commented Nov 1, 2023

gvanrossum commented Nov 1, 2023

eduardo-elizondo commented Nov 3, 2023 • edited Loading

ericsnowcurrently commented Nov 3, 2023

arigo commented Nov 3, 2023 • edited Loading

arigo commented Nov 6, 2023

ericsnowcurrently commented Nov 6, 2023

arigo commented Nov 8, 2023 • edited Loading

ericsnowcurrently commented Nov 8, 2023

eduardo-elizondo commented Dec 31, 2023

eduardo-elizondo commented Dec 31, 2023

cfbolz commented Jan 1, 2024

arigo commented Jan 1, 2024

markshannon Apr 9, 2024

Choose a reason for hiding this comment

eduardo-elizondo Apr 21, 2024 • edited Loading

Choose a reason for hiding this comment

eduardo-elizondo commented Apr 11, 2020 •

edited

Loading

eduardo-elizondo commented Apr 11, 2020 •

edited

Loading

eduardo-elizondo commented Apr 11, 2020 •

edited

Loading

eduardo-elizondo commented Apr 14, 2020 •

edited

Loading

pablogsal commented Apr 14, 2020 •

edited

Loading

eduardo-elizondo commented Apr 25, 2023 •

edited

Loading

eduardo-elizondo commented Nov 3, 2023 •

edited

Loading

arigo commented Nov 3, 2023 •

edited

Loading

arigo commented Nov 8, 2023 •

edited

Loading

eduardo-elizondo Apr 21, 2024 •

edited

Loading