PEP: 442
Title: Safe object finalization
Version:
This PEP proposes to deal with the current limitations of object finalization. The goal is to be able to define and run finalizers for any object, regardless of their position in the object graph.
This PEP doesn't call for any change in Python code. Objects with existing finalizers will benefit automatically.
- Reference
- A directional link from an object to another. The target of the reference is kept alive by the reference, as long as the source is itself alive and the reference isn't cleared.
- Weak reference
- A directional link from an object to another, which doesn't keep alive its target. This PEP focuses on non-weak references.
- Reference cycle
- A cyclic subgraph of directional links between objects, which keeps those objects from being collected in a pure reference-counting scheme.
- Cyclic isolate (CI)
- A standalone subgraph of objects in which no object is referenced from the outside, containing one or several reference cycles, and whose objects are still in a usable, non-broken state: they can access each other from their respective finalizers.
- Cyclic garbage collector (GC)
- A device able to detect cyclic isolates and turn them into cyclic trash. Objects in cyclic trash are eventually disposed of by the natural effect of the references being cleared and their reference counts dropping to zero.
- Cyclic trash (CT)
- A former cyclic isolate whose objects have started being cleared by the GC. Objects in cyclic trash are potential zombies; if they are accessed by Python code, the symptoms can vary from weird AttributeErrors to crashes.
- Zombie / broken object
- An object part of cyclic trash. The term stresses that the object is not safe: its outgoing references may have been cleared, or one of the objects it references may be zombie. Therefore, it should not be accessed by arbitrary code (such as finalizers).
- Finalizer
- A function or method called when an object is intended to be
disposed of. The finalizer can access the object and release any
resource held by the object (for example mutexes or file
descriptors). An example is a
__del__
method. - Resurrection
- The process by which a finalizer creates a new reference to an
object in a CI. This can happen as a quirky but supported
side-effect of
__del__
methods.
While this PEP discusses CPython-specific implementation details, the
change in finalization semantics is expected to affect the Python
ecosystem as a whole. In particular, this PEP obsoletes the current
guideline that "objects with a __del__
method should not be part of a
reference cycle".
The primary benefits of this PEP regard objects with finalizers, such
as objects with a __del__
method and generators with a finally
block. Those objects can now be reclaimed when they are part of a
reference cycle.
The PEP also paves the way for further benefits:
- The module shutdown procedure may not need to set global variables to None anymore. This could solve a well-known class of irritating issues.
The PEP doesn't change the semantics of:
- Weak references caught in reference cycles.
- C extension types with a custom
tp_dealloc
function.
In normal reference-counted disposal, an object's finalizer is called just before the object is deallocated. If the finalizer resurrects the object, deallocation is aborted.
However, if the object was already finalized, then the finalizer isn't called. This prevents us from finalizing zombies (see below).
Cyclic isolates are first detected by the garbage collector, and then disposed of. The detection phase doesn't change and won't be described here. Disposal of a CI traditionally works in the following order:
- Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
- The CI becomes a CT as the GC systematically breaks all
known references inside it (using the
tp_clear
function). - Nothing. All CT objects should have been disposed of in step 2 (as a side-effect of clearing references); this collection is finished.
This PEP proposes to turn CI disposal into the following sequence (new steps are in bold):
- Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
- The finalizers of all CI objects are called.
- The CI is traversed again to determine if it is still isolated. If it is determined that at least one object in CI is now reachable from outside the CI, this collection is aborted and the whole CI is resurrected. Otherwise, proceed.
- The CI becomes a CT as the GC systematically breaks all
known references inside it (using the
tp_clear
function). - Nothing. All CT objects should have been disposed of in step 4 (as a side-effect of clearing references); this collection is finished.
Note
The GC doesn't recalculate the CI after step 2 above, hence the need for step 3 to check that the whole subgraph is still isolated.
Type objects get a new tp_finalize
slot to which __del__
methods
are mapped (and reciprocally). Generators are modified to use this slot,
rather than tp_del
. A tp_finalize
function is a normal C
function which will be called with a valid and alive PyObject
as its
only argument. It doesn't need to manipulate the object's reference count,
as this will be done by the caller. However, it must ensure that the
original exception state is restored before returning to the caller.
For compatibility, tp_del
is kept in the type structure. Handling
of objects with a non-NULL tp_del
is unchanged: when part of a CI,
they are not finalized and end up in gc.garbage
. However, a non-NULL
tp_del
is not encountered anymore in the CPython source tree (except
for testing purposes).
Two new C API functions are provided to ease calling of tp_finalize
,
especially from custom deallocators.
On the internal side, a bit is reserved in the GC header for GC-managed objects to signal that they were finalized. This helps avoid finalizing an object twice (and, especially, finalizing a CT object after it was broken by the GC).
Note
Objects which are not GC-enabled can also have a tp_finalize
slot.
They don't need the additional bit since their tp_finalize
function
can only be called from the deallocator: it therefore cannot be called
twice, except when resurrected.
Following this scheme, an object's finalizer is always called exactly once, even if it was resurrected afterwards.
For CI objects, the order in which finalizers are called (step 2 above) is undefined.
It is important to explain why the proposed change is safe. There are two aspects to be discussed:
- Can a finalizer access zombie objects (including the object being finalized)?
- What happens if a finalizer mutates the object graph so as to impact the CI?
Let's discuss the first issue. We will divide possible cases in two categories:
- If the object being finalized is part of the CI: by construction, no objects in CI are zombies yet, since CI finalizers are called before any reference breaking is done. Therefore, the finalizer cannot access zombie objects, which don't exist.
- If the object being finalized is not part of the CI/CT: by definition, objects in the CI/CT don't have any references pointing to them from outside the CI/CT. Therefore, the finalizer cannot reach any zombie object (that is, even if the object being finalized was itself referenced from a zombie object).
Now for the second issue. There are three potential cases:
- The finalizer clears an existing reference to a CI object. The CI object may be disposed of before the GC tries to break it, which is fine (the GC simply has to be aware of this possibility).
- The finalizer creates a new reference to a CI object. This can only happen from a CI object's finalizer (see above why). Therefore, the new reference will be detected by the GC after all CI finalizers are called (step 3 above), and collection will be aborted without any objects being broken.
- The finalizer clears or creates a reference to a non-CI object. By construction, this is not a problem.
An implementation is available in branch finalize
of the repository
at http://hg.python.org/features/finalize/.
Besides running the normal Python test suite, the implementation adds
test cases for various finalization possibilities including reference cycles,
object resurrection and legacy tp_del
slots.
The implementation has also been checked to not produce any regressions on the following test suites:
- Tulip, which makes an extensive use of generators
- Tornado
- SQLAlchemy
- Django
- zope.interface
Notes about reference cycle collection and weak reference callbacks: http://hg.python.org/cpython/file/4e687d53b645/Modules/gc_weakref.txt
Generator memory leak: http://bugs.python.org/issue17468
Allow objects to decide if they can be collected by GC: http://bugs.python.org/issue9141
Module shutdown procedure based on GC http://bugs.python.org/issue812369
This document has been placed in the public domain.