Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flawed assumptions about tp_dictoffset in inheritance. #95589

Closed
markshannon opened this issue Aug 3, 2022 · 10 comments
Closed

Flawed assumptions about tp_dictoffset in inheritance. #95589

markshannon opened this issue Aug 3, 2022 · 10 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@markshannon
Copy link
Member

In Python, the __dict__ and __weakref__ slots are treated specially (slots meaning __slots__, not tp_slots)
They are automatically insert by the VM when creating a class.

class C(list):
    pass

>>> C().__dict__
{}

In order to support inheritance, specifically multiple inheritance, the VM can lay out subclasses in ways that differ from the superclass.
This is OK, provided __dict__ and __weakref__ are only accessed though the tp_dictoffset and tp_weaklistoffset offsets.
But, if either field is accessed directly, then we access invalid memory and 💥

test.py:

from _testcapi import HeapCTypeWithDict
class I3(HeapCTypeWithDict, list): pass

i = I3()
i.append(0)
print(i.dictobj)
$ python3.10 ~/test/test.py 
Segmentation fault (core dumped)

We have (accidentally) fixed this for __dict__ in 3.11, although at the expense breaking backwards compatibility for some C extensions. However, the problem still remains for __weakref__.

Backwards incompatibility

from _testcapi import HeapCTypeWithDict
class I3(HeapCTypeWithDict, list): pass
print("OK")
$ python3.10 test.py 
OK
$ python3.12 test.py 
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    class I3(HeapCTypeWithDict, list): pass
TypeError: multiple bases have instance lay-out conflict
@markshannon markshannon added type-bug An unexpected behavior, bug, or error 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump 3.12 bugs and security fixes labels Aug 3, 2022
@markshannon
Copy link
Member Author

Here's the weakref version:

from _testcapi import HeapCTypeWithWeakref
class I3(HeapCTypeWithWeakref, list): pass

i = I3()
i.append(0)
i.weakreflist

print("OK")
$ python3.10 test.py 
Segmentation fault (core dumped)
$ python3.12 test.py 
Segmentation fault (core dumped)

@markshannon
Copy link
Member Author

Proposed fix

3.11

Revert the inheritance behavior from 3.10
Although this introduces possible crashes, it should help pybind11 and mypyc port to 3.11.

3.12

For 3.12 we should re-introduce the breaking, but not crashing, change and extend it to weakrefs.

We then need to provide a better, and properly documented, API for for C extension classes to reliably
use __dict__s and weakrefs managed by the VM.

@encukou
Copy link
Member

encukou commented Aug 3, 2022

+1. Though IMO it would be better to treat 3.10 behavior (only Py_TYPE(obj)->tp_dictoffset is valid) as the status quo, and the improvements as a change.

For the better API, my current thinking is adding API for clearing/visiting only a single type's data, so e.g. tp_visit would call each base's tp_visit_data in a loop, with “VM managed” things (dict, weaklist, type) handled only in CPython code. Users could still override all of tp_visit for backcompat (and/or speed?), but we'd discourage it.

@markshannon
Copy link
Member Author

As only one base class can have opaque data, there would be only be one valid tp_visit_data, which speeds things up.

There are only about 10 or so valid permutations of pre-headers, opaque data, and slots, so we can generate a complete set of efficient visit, clear and dealloc functions for use in subclasses.

@encukou
Copy link
Member

encukou commented Aug 3, 2022

Revert the inheritance behavior from 3.10

Can you do that today/tomorrow, for the RC?

As only one base class can have opaque data, there would be only be one valid tp_visit_data, which speeds things up.

Only one direct base, but its base can also have opaque data. Still it'd be only one chain to walk for data&slots.

@markshannon
Copy link
Member Author

Can you do that today/tomorrow, for the RC?

Sure.
@Fidget-Spinner already has #95242 open, but I'll need to add tests.

@markshannon
Copy link
Member Author

Only one direct base, but its base can also have opaque data.

That's the responsibility of the direct base, since both must be written in C to do that.

@encukou
Copy link
Member

encukou commented Aug 3, 2022

both must be written in C to do that.

True, but they might be from different libraries with different release schedules. The other library might be CPython, with its changing requirements on what needs to be visited – or it could be pybind11 or numpy or whatever, CPython shouldn't need special treatment here for its pre-headers.
An alternative would be that you're expected to delegate to the base, but it's surprisingly tricky to do the equivalent of super() correctly (as shown, in part, by subtype_visit with managed dicts).

@Fidget-Spinner
Copy link
Member

Can you do that today/tomorrow, for the RC?

Sure. @Fidget-Spinner already has #95242 open, but I'll need to add tests.

Superseded by #95596.

@markshannon
Copy link
Member Author

True, but they might be from different libraries with different release schedules.

That's not possible. If there were then the only way they could be combined is in Python, which would raise a TypeError due to the incompatible layout.

markshannon added a commit that referenced this issue Aug 17, 2022
… inheritance (GH-96028)

* Treat tp_weakref and tp_dictoffset like other opaque slots for multiple inheritance.

* Document Py_TPFLAGS_MANAGED_DICT and Py_TPFLAGS_MANAGED_WEAKREF in what's new.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

3 participants