Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 649 – Deferred Evaluation Of Annotations Using Descriptors

Author:
Larry Hastings <larry at hastings.org>
Discussions-To:
Discourse thread
Status:
Accepted
Type:
Standards Track
Topic:
Typing
Created:
11-Jan-2021
Python-Version:
3.14
Post-History:
11-Jan-2021, 12-Apr-2021, 18-Apr-2021, 09-Aug-2021, 20-Oct-2021, 20-Oct-2021, 17-Nov-2021, 15-Mar-2022, 23-Nov-2022, 07-Feb-2023, 11-Apr-2023
Replaces:
563
Resolution:
08-May-2023

Table of Contents

Abstract

Annotations are a Python technology that allows expressing type information and other metadata about Python functions, classes, and modules. But Python’s original semantics for annotations required them to be eagerly evaluated, at the time the annotated object was bound. This caused chronic problems for static type analysis users using “type hints”, due to forward-reference and circular-reference problems.

Python solved this by acceptingPEP 563,incorporating a new approach called “stringized annotations” in which annotations were automatically converted into strings by Python. This solved the forward-reference and circular-reference problems, and also fostered intriguing new uses for annotation metadata. But stringized annotations in turn caused chronic problems for runtime users of annotations.

This PEP proposes a new and comprehensive third approach for representing and computing annotations. It adds a new internal mechanism for lazily computing annotations on demand, via a new object method called__annotate__. This approach, when combined with a novel technique for coercing annotation values into alternative formats, solves all the above problems, supports all existing use cases, and should foster future innovations in annotations.

Overview

This PEP adds a new dunder attribute to the objects that support annotations–functions, classes, and modules. The new attribute is called__annotate__,and is a reference to a function which computes and returns that object’s annotations dict.

At compile time, if the definition of an object includes annotations, the Python compiler will write the expressions computing the annotations into its own function. When run, the function will return the annotations dict. The Python compiler then stores a reference to this function in __annotate__on the object.

Furthermore,__annotations__is redefined to be a “data descriptor” which calls this annotation function once and caches the result.

This mechanism delays the evaluation of annotations expressions until the annotations are examined, which solves many circular reference problems.

This PEP also defines new functionality for two functions in the Python standard library: inspect.get_annotationsandtyping.get_type_hints. The functionality is accessed via a new keyword-only parameter, format.formatallows the user to request the annotations from these functions in a specific format. Format identifiers are always predefined integer values. The formats defined by this PEP are:

  • inspect.VALUE=1

    The default value. The function will return the conventional Python values for the annotations. This format is identical to the return value for these functions under Python 3.11.

  • inspect.FORWARDREF=2

    The function will attempt to return the conventional Python values for the annotations. However, if it encounters an undefined name, or a free variable that has not yet been associated with a value, it dynamically creates a proxy object (aForwardRef) that substitutes for that value in the expression, then continues evaluation. The resulting dict may contain a mixture of proxies and real values. If all real values are defined at the time the function is called,inspect.FORWARDREFand inspect.VALUEproduce identical results.

  • inspect.SOURCE=3

    The function will produce an annotation dictionary where the values have been replaced by strings containing the original source code for the annotation expressions. These strings may only be approximate, as they may be reverse-engineered from another format, rather than preserving the original source code, but the differences will be minor.

If accepted, this PEP wouldsupersedePEP 563, andPEP 563’s behavior would be deprecated and eventually removed.

Comparison Of Annotation Semantics

Note

The code presented in this section is simplified for clarity, and is intentionally inaccurate in some critical aspects. This example is intended merely to communicate the high-level concepts involved without getting lost in the details. But readers should note that the actual implementation is quite different in several important ways. See theImplementation section later in this PEP for a far more accurate description of what this PEP proposes from a technical level.

Consider this example code:

deffoo(x:int=3,y:MyType=None)->float:
...
classMyType:
...
foo_y_annotation=foo.__annotations__['y']

As we see here, annotations are available at runtime through an __annotations__attribute on functions, classes, and modules. When annotations are specified on one of these objects, __annotations__is a dictionary mapping the names of the fields to the value specified as that field’s annotation.

The default behavior in Python is to evaluate the expressions for the annotations, and build the annotations dict, at the time the function, class, or module is bound. At runtime the above code actually works something like this:

annotations={'x':int,'y':MyType,'return':float}
deffoo(x=3,y="abc"):
...
foo.__annotations__=annotations
classMyType:
...
foo_y_annotation=foo.__annotations__['y']

The crucial detail here is that the valuesint,MyType, andfloatare looked up at the time the function object is bound, and these values are stored in the annotations dict. But this code doesn’t run—it throws aNameErroron the first line, becauseMyTypehasn’t been defined yet.

PEP 563’s solution is to decompile the expressions back into strings during compilation and store those strings as the values in the annotations dict. The equivalent runtime code would look something like this:

annotations={'x':'int','y':'MyType','return':'float'}
deffoo(x=3,y="abc"):
...
foo.__annotations__=annotations
classMyType:
...
foo_y_annotation=foo.__annotations__['y']

This code now runs successfully. However,foo_y_annotation is no longer a reference toMyType,it is thestring 'MyType'.To turn the string into the real valueMyType, the user would need to evaluate the string usingeval, inspect.get_annotations,ortyping.get_type_hints.

This PEP proposes a third approach, delaying the evaluation of the annotations by computing them in their own function. If this PEP was active, the generated code would work something like this:

classfunction:
# __annotations__ on a function object is already a
# "data descriptor" in Python, we're just changing
# what it does
@property
def__annotations__(self):
returnself.__annotate__()

#...

defannotate_foo():
return{'x':int,'y':MyType,'return':float}
deffoo(x=3,y="abc"):
...
foo.__annotate__=annotate_foo
classMyType:
...
foo_y_annotation=foo.__annotations__['y']

The important change is that the code constructing the annotations dict now lives in a function—here, called annotate_foo().But this function isn’t called until we ask for the value offoo.__annotations__, and we don’t do that untilafterthe definition ofMyType. So this code also runs successfully, andfoo_y_annotationnow has the correct value–the classMyType–even though MyTypewasn’t defined untilafterthe annotation was defined.

Mistaken Rejection Of This Approach In November 2017

During the early days of discussion aroundPEP 563, in a November 2017 thread incomp.lang. Python -dev, the idea of using code to delay the evaluation of annotations was briefly discussed. At the time the technique was termed an “implicit lambda expression”.

Guido van Rossum—Python’s BDFL at the time—replied, asserting that these “implicit lambda expression” wouldn’t work, because they’d only be able to resolve symbols at module-level scope:

IMO the inability of referencing class-level definitions from annotations on methods pretty much kills this idea.

https://mail. Python.org/pipermail/ Python -dev/2017-November/150109.html

This led to a short discussion about extending lambda-ized annotations for methods to be able to refer to class-level definitions, by maintaining a reference to the class-level scope. This idea, too, was quickly rejected.

PEP 563 summarizes the above discussion

The approach taken by this PEP doesn’t suffer from these restrictions. Annotations can access module-level definitions, class-level definitions, and even local and free variables.

Motivation

A History Of Annotations

Python 3.0 shipped with a new syntax feature, “annotations”, defined inPEP 3107. This allowed specifying a Python value that would be associated with a parameter of a Python function, or with the value that function returns. Said another way, annotations gave Python users an interface to provide rich metadata about a function parameter or return value, for example type information. All the annotations for a function were stored together in a new attribute__annotations__,in an “annotation dict” that mapped parameter names (or, in the case of the return annotation, using the name'return') to their Python value.

In an effort to foster experimentation, Python intentionally didn’t define what form this metadata should take, or what values should be used. User code began experimenting with this new facility almost immediately. But popular libraries that make use of this functionality were slow to emerge.

After years of little progress, the BDFL chose a particular approach for expressing static type information, called type hints,as defined inPEP 484.Python 3.5 shipped with a newtypingmodule which quickly became very popular.

Python 3.6 added syntax to annotate local variables, class attributes, and module attributes, using the approach proposed inPEP 526.Static type analysis continued to grow in popularity.

However, static type analysis users were increasingly frustrated by an inconvenient problem: forward references. In classic Python, if a class C depends on a later-defined class D, it’s normally not a problem, because user code will usually wait until both are defined before trying to use either. But annotations added a new complication, because they were computed at the time the annotated object (function, class, or module) was bound. If methods on class C are annotated with type D, and these annotation expressions are computed at the time that the method is bound, D may not be defined yet. And if methods in D are also annotated with type C, you now have an unresolvable circular reference problem.

Initially, static type users worked around this problem by defining their problematic annotations as strings. This worked because a string containing the type hint was just as usable for the static type analysis tool. And users of static type analysis tools rarely examine the annotations at runtime, so this representation wasn’t itself an inconvenience. But manually stringizing type hints was clumsy and error-prone. Also, code bases were adding more and more annotations, which consumed more and more CPU time to create and bind.

To solve these problems, the BDFL acceptedPEP 563,which added a new feature to Python 3.7: “stringized annotations”. It was activated with a future import:

from__future__importannotations

Normally, annotation expressions were evaluated at the time the object was bound, with their values being stored in the annotations dict. When stringized annotations were active, these semantics changed: instead, at compile time, the compiler converted all annotations in that module into string representations of their source code–thus,automatically turning the users’s annotations into strings, obviating the need tomanuallystringize them as before.PEP 563 suggested users could evaluate this string witheval if the actual value was needed at runtime.

(From here on out, this PEP will refer to the classic semantics ofPEP 3107andPEP 526,where the values of annotation expressions are computed at the time the object is bound, as“stock” semantics,to differentiate them from the newPEP 563“stringized” annotation semantics.)

The Current State Of Annotation Use Cases

Although there are many specific use cases for annotations, annotation users in the discussion around this PEP tended to fall into one of these four categories.

Static typing users

Static typing users use annotations to add type information to their code. But they largely don’t examine the annotations at runtime. Instead, they use static type analysis tools (mypy, pytype) to examine their source tree and determine whether or not their code is using types consistently. This is almost certainly the most popular use case for annotations today.

Many of the annotations usetype hints,a laPEP 484 (and many subsequent PEPs). Type hints are passive objects, mere representation of type information; they don’t do any actual work. Type hints are often parameterized with other types or other type hints. Since they’re agnostic about what these actual values are, type hints work fine withForwardRefproxy objects. Users of static type hints discovered that extensive type hinting under stock semantics often created large-scale circular reference and circular import problems that could be difficult to solve.PEP 563was designed specifically to solve this problem, and the solution worked great for these users. The difficulty of rendering stringized annotations into real values largely didn’t inconvenience these users because of how infrequently they examine annotations at runtime.

Static typing users often combinePEP 563with the iftyping.TYPE_CHECKINGidiom to prevent their type hints from being loaded at runtime. This means they often aren’t able to evaluate their stringized annotations and produce real values at runtime. On the rare occasion that they do examine annotations at runtime, they often forgo eval,instead using lexical analysis directly on the stringized annotations.

Under this PEP, static typing users will probably preferFORWARDREF orSOURCEformat.

Runtime annotation users

Runtime annotation users use annotations as a means of expressing rich metadata about their functions and classes, which they use as input to runtime behavior. Specific use cases include runtime type verification (Pydantic) and glue logic to expose Python APIs in another domain (FastAPI, Typer). The annotations may or may not be type hints.

As runtime annotation users examine annotations at runtime, they were traditionally better served with stock semantics. This use case is largely incompatible withPEP 563,particularly with the iftyping.TYPE_CHECKINGidiom.

Under this PEP, runtime annotation users will most likely preferVALUE format, though some (e.g. if they evaluate annotations eagerly in a decorator and want to support forward references) may also useFORWARDREFformat.

Wrappers

Wrappers are functions or classes that wrap user functions or classes and add functionality. Examples of this would be dataclass(),functools.partial(), attrs,andwrapt.

Wrappers are a distinct subcategory of runtime annotation users. Although they do use annotations at runtime, they may or may not actually examine the annotations of the objects they wrap–it depends on the functionality the wrapper provides. As a rule they should propagate the annotations of the wrapped object to the wrapper they create, although it’s possible they may modify those annotations.

Wrappers were generally designed to work well under stock semantics. Whether or not they work well underPEP 563semantics depends on the degree to which they examine the wrapped object’s annotations. Often wrappers don’t care about the value per se, only needing specific information about the annotations. Even so,PEP 563 and theiftyping.TYPE_CHECKINGidiom can make it difficult for wrappers to reliably determine the information they need at runtime. This is an ongoing, chronic problem. Under this PEP, wrappers will probably preferFORWARDREFformat for their internal logic. But the wrapped objects need to support all formats for their users.

Documentation

PEP 563stringized annotations were a boon for tools that mechanically construct documentation.

Stringized type hints make for excellent documentation; type hints as expressed in source code are often succinct and readable. However, at runtime these same type hints can produce value at runtime whose repr is a sprawling, nested, unreadable mess. Thus documentation users were well-served byPEP 563but poorly served with stock semantics.

Under this PEP, documentation users are expected to useSOURCEformat.

Motivation For This PEP

Python’s original semantics for annotations made its use for static type analysis painful due to forward reference problems. PEP 563solved the forward reference problem, and many static type analysis users became happy early adopters of it. But its unconventional solution created new problems for two of the above cited use cases: runtime annotation users, and wrappers.

First, stringized annotations didn’t permit referencing local or free variables, which meant many useful, reasonable approaches to creating annotations were no longer viable. This was particularly inconvenient for decorators that wrap existing functions and classes, as these decorators often use closures.

Second, in order forevalto correctly look up globals in a stringized annotation, you must first obtain a reference to the correct module. But class objects don’t retain a reference to their globals. PEP 563suggests looking up a class’s module by name in sys.modules—a surprising requirement for a language-level feature.

Additionally, complex but legitimate constructions can make it difficult to determine the correct globals and locals dicts to give toevalto properly evaluate a stringized annotation. Even worse, in some situations it may simply be infeasible.

For example, some libraries (e.g.typing.TypedDict,dataclasses) wrap a user class, then merge all the annotations from all that class’s base classes together into one cumulative annotations dict. If those annotations were stringized, callingevalon them later may not work properly, because the globals dictionary used for the evalwill be the module where theuser classwas defined, which may not be the same module where theannotationwas defined. However, if the annotations were stringized because of forward-reference problems, callingevalon them early may not work either, due to the forward reference not being resolvable yet. This has proved to be difficult to reconcile; of the three bug reports linked to below, only one has been marked as fixed.

Even with proper globalsandlocals,evalcan be unreliable on stringized annotations. evalcan only succeed if all the symbols referenced in an annotations are defined. If a stringized annotation refers to a mixture of defined and undefined symbols, a simpleeval of that string will fail. This is a problem for libraries with that need to examine the annotation, because they can’t reliably convert these stringized annotations into real values.

  • Some libraries (e.g.dataclasses) solved this by foregoing real values and performing lexical analysis of the stringized annotation, which requires a lot of work to get right.
  • Other libraries still suffer with this problem, which can produce surprising runtime behavior. https://github / Python /c Python /issues/97727

Also,eval()is slow, and it isn’t always available; it’s sometimes removed for space reasons on certain platforms. eval()on MicroPython doesn’t support thelocals argument, which makes converting stringized annotations into real values at runtime even harder.

Finally,PEP 563requires Python implementations to stringize their annotations. This is surprising behavior—unprecedented for a language-level feature, with a complicated implementation, that must be updated whenever a new operator is added to the language.

These problems motivated the research into finding a new approach to solve the problems facing annotations users, resulting in this PEP.

Implementation

Observed semantics for annotations expressions

For any objectothat supports annotations, provided that all names evaluated in the annotations expressions are bound beforeois defined and never subsequently rebound, o.__annotations__will produce an identical annotations dict both when “stock” semantics are active and when this PEP is active. In particular, name resolution will be performed identically in both scenarios.

When this PEP is active, the value ofo.__annotations__ won’t be calculated until the first timeo.__annotations__ itself is evaluated. All evaluation of the annotation expressions is delayed until this moment, which also means that

  • names referenced in the annotations expressions will use their currentvalue at this moment, and
  • if evaluating the annotations expressions raises an exception, that exception will be raised at this moment.

Onceo.__annotations__is successfully calculated for the first time, this value is cached and will be returned by future requests foro.__annotations__.

__annotate__ and __annotations__

Python supports annotations on three different types: functions, classes, and modules. This PEP modifies the semantics on all three of these types in a similar way.

First, this PEP adds a new “dunder” attribute,__annotate__. __annotate__must be a “data descriptor”, implementing all three actions: get, set, and delete. The__annotate__attribute is always defined, and may only be set to eitherNoneor to a callable. (__annotate__cannot be deleted.) If an object has no annotations,__annotate__should be initialized toNone,rather than to a function that returns an empty dict.

The__annotate__data descriptor must have dedicated storage inside the object to store the reference to its value. The location of this storage at runtime is an implementation detail. Even if it’s visible to Python code, it should still be considered an internal implementation detail, and Python code should prefer to interact with it only via the __annotate__attribute.

The callable stored in__annotate__must accept a single required positional argument calledformat, which will always be anint(or a subclass ofint). It must either return a dict (or subclass of dict) or raiseNotImplementedError().

Here’s a formal definition of__annotate__,as it will appear in the “Magic methods” section of the Python Language Reference:

__annotate__(format:int)->dict

Returns a new dictionary object mapping attribute/parameter names to their annotation values.

Takes aformatparameter specifying the format in which annotations values should be provided. Must be one of the following:

inspect.VALUE(equivalent to theintconstant1)

Values are the result of evaluating the annotation expressions.

inspect.FORWARDREF(equivalent to theintconstant2)

Values are real annotation values (as perinspect.VALUEformat) for defined values, andForwardRefproxies for undefined values. Real objects may be exposed to, or contain references to, ForwardRefproxy objects.

inspect.SOURCE(equivalent to theintconstant3)

Values are the text string of the annotation as it appears in the source code. May only be approximate; whitespace may be normalized, and constant values may be optimized. It’s possible the exact values of these strings could change in future version of Python.

If an__annotate__function doesn’t support the requested format, it must raiseNotImplementedError(). __annotate__functions must always support1(inspect.VALUE) format; they must not raiseNotImplementedError()when called with format=1.

When called withformat=1,an__annotate__function may raiseNameError;it must not raiseNameErrorwhen called requesting any other format.

If an object doesn’t have any annotations,__annotate__should preferably be set toNone(it can’t be deleted), rather than set to a function that returns an empty dict.

When the Python compiler compiles an object with annotations, it simultaneously compiles the appropriate annotate function. This function, called with the single positional argumentinspect.VALUE, computes and returns the annotations dict as defined on that object. The Python compiler and runtime work in concert to ensure that the function is bound to the appropriate namespaces:

  • For functions and classes, the globals dictionary will be the module where the object was defined. If the object is itself a module, its globals dictionary will be its own dict.
  • For methods on classes, and for classes, the locals dictionary will be the class dictionary.
  • If the annotations refer to free variables, the closure will be the appropriate closure tuple containing cells for free variables.

Second, this PEP requires that the existing __annotations__must be a “data descriptor”, implementing all three actions: get, set, and delete. __annotations__must also have its own internal storage it uses to cache a reference to the annotations dict:

  • Class and module objects must cache the annotations dict in their__dict__,using the key __annotations__.This is required for backwards compatibility reasons.
  • For function objects, storage for the annotations dict cache is an implementation detail. It’s preferably internal to the function object and not visible in Python.

This PEP defines semantics on how__annotations__and __annotate__interact, for all three types that implement them. In the following examples,fnrepresents a function,cls represents a class,modrepresents a module, andorepresents an object of any of these three types:

  • Wheno.__annotations__is evaluated, and the internal storage foro.__annotations__is unset, ando.__annotate__is set to a callable, the getter foro.__annotations__calls o.__annotate__(1),then caches the result in its internal storage and returns the result.
    • To explicitly clarify one question that has come up multiple times: thiso.__annotations__cache is theonlycaching mechanism defined in this PEP. There areno othercaching mechanisms defined in this PEP. The__annotate__functions generated by the Python compiler explicitly don’t cache any of the values they compute.
  • Settingo.__annotate__to a callable invalidates the cached annotations dict.
  • Settingo.__annotate__toNonehas no effect on the cached annotations dict.
  • Deletingo.__annotate__raisesTypeError. __annotate__must always be set; this prevents unannotated subclasses from inheriting the__annotate__method of one of their base classes.
  • Settingo.__annotations__to a legal value automatically setso.__annotate__toNone.
    • Settingcls.__annotations__ormod.__annotations__ toNoneotherwise works like any other attribute; the attribute is set toNone.
    • Settingfn.__annotations__toNoneinvalidates the cached annotations dict. Iffn.__annotations__ doesn’t have a cached annotations value, andfn.__annotate__ isNone,thefn.__annotations__data descriptor creates, caches, and returns a new empty dict. (This is for backwards compatibility withPEP 3107semantics.)

Changes to allowable annotations syntax

__annotate__now delays the evaluation of annotations until __annotations__is referenced in the future. It also means annotations are evaluated in a new function, rather than in the original context where the object they were defined on was bound. There are four operators with significant runtime side-effects that were permitted in stock semantics, but are disallowed when from__future__importannotationsis active, and will have to be disallowed when this PEP is active:

  • :=
  • yield
  • yieldfrom
  • await

Changes toinspect.get_annotationsandtyping.get_type_hints

(This PEP makes frequent reference to these two functions. In the future it will refer to them collectively as “the helper functions”, as they help user code work with annotations.)

These two functions extract and return the annotations from an object. inspect.get_annotationsreturns the annotations unchanged; for the convenience of static typing users,typing.get_type_hints makes some modifications to the annotations before it returns them.

This PEP adds a new keyword-only parameter to these two functions, format.formatspecifies what format the values in the annotations dict should be returned in. Theformatparameter on these two functions accepts the same values as theformatparameter on the__annotate__magic method defined above; however, theseformatparameters also have a default value ofinspect.VALUE.

When either__annotations__or__annotate__is updated on an object, the other of those two attributes is now out-of-date and should also either be updated or deleted (set toNone,in the case of__annotate__ which cannot be deleted). In general, the semantics established in the previous section ensure that this happens automatically. However, there’s one case which for all practical purposes can’t be handled automatically: when the dict cached byo.__annotations__is itself modified, or when mutable values inside that dict are modified.

Since this can’t be handled in code, it must be handled in documentation. This PEP proposes amending the documentation forinspect.get_annotations(and similarly for typing.get_type_hints) as follows:

If you directly modify the__annotations__dict on an object, by default these changes may not be reflected in the dictionary returned byinspect.get_annotationswhen requesting either SOURCEorFORWARDREFformat on that object. Rather than modifying the__annotations__dict directly, consider replacing that object’s__annotate__method with a function computing the annotations dict with your desired values. Failing that, it’s best to overwrite the object’s__annotate__method withNone to preventinspect.get_annotationsfrom generating stale results forSOURCEandFORWARDREFformats.

Thestringizerand thefakeglobalsenvironment

As originally proposed, this PEP supported many runtime annotation user use cases, and many static type user use cases. But this was insufficient–this PEP could not be accepted until it satisfiedallextant use cases. This became a longtime blocker of this PEP until Carl Meyer proposed the “stringizer” and the “fake globals” environment as described below. These techniques allow this PEP to support both theFORWARDREFandSOURCEformats, ably satisfying all remaining uses cases.

In a nutshell, this technique involves running a Python-compiler-generated__annotate__function in an exotic runtime environment. Its normalglobals dict is replaced with what’s called a “fake globals” dict. A “fake globals” dict is a dict with one important difference: every time you “get” a key from it that isn’t mapped, it creates, caches, and returns a new value for that key (as per the__missing__callback for a dictionary). That value is a an instance of a novel type referred to as a “stringizer”.

A “stringizer” is a Python class with highly unusual behavior. Every stringizer is initialized with its “value”, initially the name of the missing key in the “fake globals” dict. The stringizer then implements every Python “dunder” method used to implement operators, and the value returned by that method is a new stringizer whose value is a text representation of that operation.

When these stringizers are used in expressions, the result of the expression is a new stringizer whose name textually represents that expression. For example, let’s say you have a variablef,which is a reference to a stringizer initialized with the value'f'.Here are some examples of operations you could perform onfand the values they would return:

>>>f
Stringizer('f')
>>>f+3
Stringizer('f + 3')
>> f[ "key" ]
Stringizer('f[ "key" ]')

Bringing it all together: if we run a Python-generated __annotate__function, but we replace its globals with a “fake globals” dict, all undefined symbols it references will be replaced with stringizer proxy objects representing those symbols, and any operations performed on those proxies will in turn result in proxies representing that expression. This allows__annotate__ to complete, and to return an annotations dict, with stringizer instances standing in for names and entire expressions that could not have otherwise been evaluated.

In practice, the “stringizer” functionality will be implemented in theForwardRefobject currently defined in the typingmodule.ForwardRefwill be extended to implement all stringizer functionality; it will also be extended to support evaluating the string it contains, to produce the real value (assuming all symbols referenced are defined). This means theForwardRefobject will retain references to the appropriate “globals”, “locals”, and even “closure” information needed to evaluate the expression.

This technique is the core of howinspect.get_annotations supportsFORWARDREFandSOURCEformats. Initially, inspect.get_annotationswill call the object’s __annotate__method requesting the desired format. If that raisesNotImplementedError,inspect.get_annotations will construct a “fake globals” environment, then call the object’s__annotate__method.

  • inspect.get_annotationsproducesSOURCEformat by creating a new empty “fake globals” dict, binding it to the object’s__annotate__method, calling that requestingVALUEformat, and then extracting the string “value” from eachForwardRefobject in the resulting dict.
  • inspect.get_annotationsproducesFORWARDREFformat by creating a new empty “fake globals” dict, pre-populating it with the current contents of the__annotate__method’s globals dict, binding the “fake globals” dict to the object’s __annotate__method, calling that requestingVALUE format, and returning the result.

This entire technique works because the__annotate__functions generated by the compiler are controlled by Python itself, and are simple and predictable. They’re effectively a singlereturnstatement, computing and returning the annotations dict. Since most operations needed to compute an annotation are implemented in Python using dunder methods, and the stringizer supports all the relevant dunder methods, this approach is a reliable, practical solution.

However, it’s not reasonable to attempt this technique with just any__annotate__method. This PEP assumes that third-party libraries may implement their own__annotate__ methods, and those functions would almost certainly work incorrectly when run in this “fake globals” environment. For that reason, this PEP allocates a flag on code objects, one of the unused bits inco_flags,to mean “This code object can be run in a ‘fake globals’ environment.” This makes the “fake globals” environment strictly opt-in, and it’s expected that only__annotate__methods generated by the Python compiler will set it.

The weakness in this technique is in handling operators which don’t directly map to dunder methods on an object. These are all operators that implement some manner of flow control, either branching or iteration:

  • Short-circuitingor
  • Short-circuitingand
  • Ternary operator (theif/thenoperator)
  • Generator expressions
  • List / dict / set comprehensions
  • Iterable unpacking

As a rule these techniques aren’t used in annotations, so it doesn’t pose a problem in practice. However, the recent addition ofTypeVarTupleto Python does use iterable unpacking. The dunder methods involved (__iter__and__next__) don’t permit distinguishing between iteration use cases; in order to correctly detect which use case was involved, mere “fake globals” and a “stringizer” wouldn’t be sufficient; this would require a custom bytecode interpreter designed specifically around producingSOURCEandFORWARDREF formats.

Thankfully there’s a shortcut that will work fine: the stringizer will simply assume that when its iteration dunder methods are called, it’s in service of iterator unpacking being performed byTypeVarTuple. It will hard-code this behavior. This means no other technique using iteration will work, but in practice this won’t inconvenience real-world use cases.

Finally, note that the “fake globals” environment will also require constructing a matching “fake locals” dictionary, which forFORWARDREFformat will be pre-populated with the relevant locals dict. The “fake globals” environment will also have to create a fake “closure”, a tuple ofForwardRefobjects pre-created with the names of the free variables referenced by the__annotate__method.

ForwardRefproxies created from__annotate__ methods that reference free variables will map the names and closure values of those free variables into the locals dictionary, to ensure thatevaluses the correct values for those names.

Compiler-generated__annotate__functions

As mentioned in the previous section, the__annotate__ functions generated by the compiler are simple. They’re mainly a singlereturnstatement, computing and returning the annotations dict.

However, the protocol forinspect.get_annotations to request eitherFORWARDREForSOURCEformat requires first asking the__annotate__method to produce it.__annotate__methods generated by the Python compiler won’t support either of these formats and will raiseNotImplementedError().

Third-party__annotate__functions

Third-party classes and functions will likely need to implement their own__annotate__methods, so that downstream users of those objects can take full advantage of annotations. In particular, wrappers will likely need to transform the annotation dicts produced by the wrapped object: adding, removing, or modifying the dictionary in some way.

Most of the time, third-party code will implement their__annotate__methods by calling inspect.get_annotationson some existing upstream object. For example, wrappers will likely request the annotations dict for their wrapped object, in the format that was requested from them, then modify the returned annotations dict as appropriate and return that. This allows third-party code to leverage the “fake globals” technique without having to understand or participate in it.

Third-party libraries that support both pre- and post-PEP-649 versions of Python will have to innovate their own best practices on how to support both. One sensible approach would be for their wrapper to always support__annotate__,then call it requesting VALUEformat and store the result as the __annotations__on their wrapper object. This would support pre-649 Python semantics, and be forward-compatible with post-649 semantics.

Pseudocode

Here’s high-level pseudocode forinspect.get_annotations:

defget_annotations(o,format):
ifformat==VALUE:
returndict(o.__annotations__)

ifformat==FORWARDREF:
try:
returndict(o.__annotations__)
exceptNameError:
pass

ifnothasattr(o.__annotate__):
return{}

c_a=o.__annotate__
try:
returnc_a(format)
exceptNotImplementedError:
ifnotcan_be_called_with_fake_globals(c_a):
return{}
c_a_with_fake_globals=make_fake_globals_version(c_a,format)
returnc_a_with_fake_globals(VALUE)

Here’s what a Python compiler-generated__annotate__method might look like if it was written in Python:

def__annotate__(self,format):
ifformat!=1:
raiseNotImplementedError()
return{...}

Here’s how a third-party wrapper class might implement __annotate__.In this example, the wrapper works likefunctools.partial,pre-binding one parameter of the wrapped callable, which for simplicity must be named arg:

def__annotate__(self,format):
ann=inspect.get_annotations(self.wrapped_fn,format)
if'arg'inann:
delann['arg']
returnann

Other modifications to the Python runtime

This PEP does not dictate exactly how it should be implemented; that is left up to the language implementation maintainers. However, the best implementation of this PEP may require adding additional information to existing Python objects, which is implicitly condoned by the acceptance of this PEP.

For example, it may be necessary to add a __globals__attribute to class objects, so that the __annotate__function for that class can be lazily bound, only on demand. Also,__annotate__functions defined on methods defined in a class may need to retain a reference to the class’s__dict__,in order to correctly evaluate names bound in that class. It’s expected that the CPython implementation of this PEP will include both those new attributes.

All such new information added to existing Python objects should be done with “dunder” attributes, as they will of course be implementation details.

Interactive REPL Shell

The semantics established in this PEP also hold true when executing code in Python’s interactive REPL shell, except for module annotations in the interactive module (__main__) itself. Since that module is never “finished”, there’s no specific point where we can compile the __annotate__function.

For the sake of simplicity, in this case we forego delayed evaluation. Module-level annotations in the REPL shell will continue to work exactly as they do with “stock semantics”, evaluating immediately and setting the result directly inside the__annotations__dict.

Annotations On Local Variables Inside Functions

Python supports syntax for local variable annotations inside functions. However, these annotations have no runtime effect–they’re discarded at compile-time. Therefore, this PEP doesn’t need to do anything to support them, the same as stock semantics andPEP 563.

Prototype

The original prototype implementation of this PEP can be found here:

https://github /larryhastings/co_annotations/

As of this writing, the implementation is severely out of date; it’s based on Python 3.10 and implements the semantics of the first draft of this PEP, from early 2021. It will be updated shortly.

Performance Comparison

Performance with this PEP is generally favorable. There are four scenarios to consider:

  • the runtime cost when annotations aren’t defined,
  • the runtime cost when annotations are defined butnotreferenced, and
  • the runtime cost when annotations are defined and referenced as objects.
  • the runtime cost when annotations are defined and referenced as strings.

We’ll examine each of these scenarios in the context of all three semantics for annotations: stock,PEP 563,and this PEP.

When there are no annotations, all three semantics have the same runtime cost: zero. No annotations dict is created and no code is generated for it. This requires no runtime processor time and consumes no memory.

When annotations are defined but not referenced, the runtime cost of Python with this PEP is roughly the same asPEP 563,and improved over stock. The specifics depend on the object being annotated:

  • With stock semantics, the annotations dict is always built, and set as an attribute of the object being annotated.
  • InPEP 563semantics, for function objects, a precompiled constant (a specially constructed tuple) is set as an attribute of the function. For class and module objects, the annotations dict is always built and set as an attribute of the class or module.
  • With this PEP, a single object is set as an attribute of the object being annotated. Most of the time, this object is a constant (a code object), but when the annotations require a class namespace or closure, this object will be a tuple constructed at binding time.

When annotations are both defined and referenced as objects, code using this PEP should be much faster thanPEP 563,and be as fast or faster than stock.PEP 563semantics requires invoking eval()for every value inside an annotations dict which is enormously slow. And the implementation of this PEP generates measurably more efficient bytecode for class and module annotations than stock semantics; for function annotations, this PEP and stock semantics should be about the same speed.

The one case where this PEP will be noticeably slower thanPEP 563is when annotations are requested as strings; it’s hard to beat “they are already strings.” But stringified annotations are intended for online documentation use cases, where performance is less likely to be a key factor.

Memory use should also be comparable in all three scenarios across all three semantic contexts. In the first and third scenarios, memory usage should be roughly equivalent in all cases. In the second scenario, when annotations are defined but not referenced, using this PEP’s semantics will mean the function/class/module will store one unused code object (possibly bound to an unused function object); with the other two semantics, they’ll store one unused dictionary or constant tuple.

Backwards Compatibility

Backwards Compatibility With Stock Semantics

This PEP preserves nearly all existing behavior of annotations from stock semantics:

  • The format of the annotations dict stored in the__annotations__attribute is unchanged. Annotations dicts contain real values, not strings as perPEP 563.
  • Annotations dicts are mutable, and any changes to them are preserved.
  • The__annotations__attribute can be explicitly set, and any legal value set this way will be preserved.
  • The__annotations__attribute can be deleted using thedelstatement.

Most code that works with stock semantics should continue to work when this PEP is active without any modification necessary. But there are exceptions, as follows.

First, there’s a well-known idiom for accessing class annotations which may not work correctly when this PEP is active. The original implementation of class annotations had what can only be called a bug: if a class didn’t define any annotations of its own, but one of its base classes did define annotations, the class would “inherit” those annotations. This behavior was never desirable, so user code found a workaround: instead of accessing the annotations on the class directly viacls.__annotations__,code would access the class’s annotations via its dict as in cls.__dict__.get( "__annotations__",{}).This idiom worked because classes stored their annotations in their__dict__,and accessing them this way avoided the lookups in the base classes. The technique relied on implementation details of CPython, so it was never supported behavior–though it was necessary. However, when this PEP is active, a class may have annotations defined but hasn’t yet called__annotate__ and cached the result, in which case this approach would lead to mistakenly assuming the class didn’t have annotations. In any case, the bug was fixed as of Python 3.10, and the idiom should no longer be used. Also as of Python 3.10, there’s an Annotations HOWTO that defines best practices for working with annotations; code that follows these guidelines will work correctly even when this PEP is active, because it suggests using different approaches to get annotations from class objects based on the Python version the code runs under.

Since delaying the evaluation of annotations until they are introspected changes the semantics of the language, it’s observable from within the language. Therefore it’spossibleto write code that behaves differently based on whether annotations are evaluated at binding time or at access time, e.g.

mytype=str
deffoo(a:mytype):pass
mytype=int
print(foo.__annotations__['a'])

This will print<class'str'>with stock semantics and<class'int'>when this PEP is active. This is therefore a backwards-incompatible change. However, this example is poor programming style, so this change seems acceptable.

There are two uncommon interactions possible with class and module annotations that work with stock semantics that would no longer work when this PEP was active. These two interactions would have to be prohibited. The good news is, neither is common, and neither is considered good practice. In fact, they’re rarely seen outside of Python’s own regression test suite. They are:

  • Code that sets annotations on module or class attributes from inside any kind of flow control statement.It’s currently possible to set module and class attributes with annotations inside anifortrystatement, and it works as one would expect. It’s untenable to support this behavior when this PEP is active.
  • Code in module or class scope that references or modifies the local__annotations__dict directly.Currently, when setting annotations on module or class attributes, the generated code simply creates a local__annotations__dict, then adds mappings to it as needed. It’s possible for user code to directly modify this dict, though this doesn’t seem to be an intentional feature. Although it would be possible to support this after a fashion once this PEP was active, the semantics would likely be surprising and wouldn’t make anyone happy.

Note that these are both also pain points for static type checkers, and are unsupported by those tools. It seems reasonable to declare that both are at the very least unsupported, and their use results in undefined behavior. It might be worth making a small effort to explicitly prohibit them with compile-time checks.

Finally, if this PEP is active, annotation values shouldn’t use theif/elseternary operator. Although this will work correctly when accessingo.__annotations__or requesting inspect.VALUEfrom a helper function, the boolean expression may not compute correctly withinspect.FORWARDREFwhen some names are defined, and would be far less correct with inspect.SOURCE.

Backwards Compatibility With PEP 563 Semantics

PEP 563changed the semantics of annotations. When its semantics are active, annotations must assume they will be evaluated in module-levelorclass-levelscope. They may no longer refer directly to local variables in the current function or an enclosing function. This PEP removes that restriction, and annotations may refer any local variable.

PEP 563requires usingeval(or a helper function like typing.get_type_hintsorinspect.get_annotationsthat usesevalfor you) to convert stringized annotations into their “real” values. Existing code that activates stringized annotations, and callseval()directly to convert the strings back into real values, can simply remove theeval()call. Existing code using a helper function would continue to work unchanged, though use of those functions may become optional.

Static typing users often have modules that only contain inert type hint definitions–but no live code. These modules are only needed when running static type checking; they aren’t used at runtime. But under stock semantics, these modules have to be imported in order for the runtime to evaluate and compute the annotations. Meanwhile, these modules often caused circular import problems that could be difficult or even impossible to solve.PEP 563allowed users to solve these circular import problems by doing two things. First, they activatedPEP 563in their modules, which meant annotations were constant strings, and didn’t require the real symbols to be defined in order for the annotations to be computable. Second, this permitted users to only import the problematic modules in aniftyping.TYPE_CHECKINGblock. This allowed the static type checkers to import the modules and the type definitions inside, but they wouldn’t be imported at runtime. So far, this approach will work unchanged when this PEP is active;iftyping.TYPE_CHECKINGis supported behavior.

However, some codebases actuallydidexamine their annotations at runtime, even when using theiftyping.TYPE_CHECKING technique and not importing definitions used in their annotations. These codebases examined the annotation stringswithout evaluating them,instead relying on identity checks or simple lexical analysis on the strings.

This PEP supports these techniques too. But users will need to port their code to it. First, user code will need to use inspect.get_annotationsortyping.get_type_hintsto access the annotations; they won’t be able to simply get the __annotations__attribute from their object. Second, they will need to specify eitherinspect.FORWARDREF orinspect.SOURCEfor theformatwhen calling that function. This means the helper function can succeed in producing the annotations dict, even when not all the symbols are defined. Code expecting stringized annotations should work unmodified withinspect.SOURCEformatted annotations dicts; however, users should consider switching to inspect.FORWARDREF,as it may make their analysis easier.

Similarly,PEP 563permitted use of class decorators on annotated classes in a way that hadn’t previously been possible. Some class decorators (e.g.dataclasses) examine the annotations on the class. Because class decorators using the@decorator syntax are run before the class name is bound, they can cause unsolvable circular-definition problems. If you annotate attributes of a class with references to the class itself, or annotate attributes in multiple classes with circular references to each other, you can’t decorate those classes with the@decorator syntax using decorators that examine the annotations.PEP 563allowed this to work, as long as the decorators examined the strings lexically and didn’t useevalto evaluate them (or handled theNameError with further workarounds). When this PEP is active, decorators will be able to compute the annotations dict ininspect.SOURCEor inspect.FORWARDREFformat using the helper functions. This will permit them to analyze annotations containing undefined symbols, in the format they prefer.

Early adopters ofPEP 563discovered that “stringized” annotations were useful for automatically-generated documentation. Users experimented with this use case, and Python’spydoc has expressed some interest in this technique. This PEP supports this use case; the code generating the documentation will have to be updated to use a helper function to access the annotations in inspect.SOURCEformat.

Finally, the warnings about using theif/elseternary operator in annotations apply equally to users ofPEP 563. It currently works for them, but could produce incorrect results when requesting some formats from the helper functions.

If this PEP is accepted,PEP 563will be deprecated and eventually removed. To facilitate this transition for early adopters ofPEP 563,who now depend on its semantics, inspect.get_annotationsandtyping.get_type_hintswill implement a special affordance.

The Python compiler won’t generate annotation code objects for objects defined in a module wherePEP 563semantics are active, even if this PEP is accepted. So, under normal circumstances, requestinginspect.SOURCEformat from a helper function would return an empty dict. As an affordance, to facilitate the transition, if the helper functions detect that an object was defined in a module withPEP 563active, and the user requestsinspect.SOURCEformat, they’ll return the current value of the__annotations__dict, which in this case will be the stringized annotations. This will allow PEP 563users who lexically analyze stringized annotations to immediately change over to requestinginspect.SOURCEformat from the helper functions, which will hopefully smooth their transition away fromPEP 563.

Rejected Ideas

“Just store the strings”

One proposed idea for supportingSOURCEformat was for the Python compiler to emit the actual source code for the annotation values somewhere, and to furnish that when the user requestedSOURCEformat.

This idea wasn’t rejected so much as categorized as “not yet”. We already know we need to supportFORWARDREF format, and that technique can be adapted to support SOURCEformat in just a few lines. There are many unanswered questions about this approach:

  • Where would we store the strings? Would they always be loaded when the annotated object was created, or would they be lazy-loaded on demand? If so, how would the lazy-loading work?
  • Would the “source code” include the newlines and comments of the original? Would it preserve all whitespace, including indents and extra spaces used purely for formatting?

It’s possible we’ll revisit this topic in the future, if improving the fidelity ofSOURCEvalues to the original source code is judged sufficiently important.

Acknowledgements

Thanks to Carl Meyer, Barry Warsaw, Eric V. Smith, Mark Shannon, Jelle Ziljstra, and Guido van Rossum for ongoing feedback and encouragement.

Particular thanks to several individuals who contributed key ideas that became some of the best aspects of this proposal:

  • Carl Meyer suggested the “stringizer” technique that made FORWARDREFandSOURCEformats possible, which allowed making forward progress on this PEP possible after a year of languishing due to seemingly-unfixable problems. He also suggested the affordance forPEP 563users where inspect.SOURCEwill return the stringized annotations, and many more suggestions besides. Carl was also the primary correspondent in private email threads discussing this PEP, and was a tireless resource and voice of sanity. This PEP would almost certainly not have been accepted it were it not for Carl’s contributions.
  • Mark Shannon suggested building the entire annotations dict inside a single code object, and only binding it to a function on demand.
  • Guido van Rossum suggested that__annotate__ functions should duplicate the name visibility rules of annotations under “stock” semantics.
  • Jelle Zijlstra contributed not only feedback–but code!

References


Source:https://github / Python /peps/blob/main/peps/pep-0649.rst

Last modified:2024-10-17 12:49:39 GMT