Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 580 – The C call protocol

Author:
Jeroen Demeyer <J.Demeyer at UGent.be>
BDFL-Delegate:
Petr Viktorin
Status:
Rejected
Type:
Standards Track
Created:
14-Jun-2018
Python-Version:
3.8
Post-History:
20-Jun-2018, 22-Jun-2018, 16-Jul-2018

Table of Contents

Rejection Notice

This PEP is rejected in favor ofPEP 590,which proposes a simpler public C API for callable objects.

Abstract

A new “C call” protocol is proposed. It is meant for classes representing functions or methods which need to implement fast calling. The goal is to generalize all existing optimizations for built-in functions to arbitrary extension types.

In the reference implementation, this new protocol is used for the existing classes builtin_function_or_methodandmethod_descriptor. However, in the future, more classes may implement it.

NOTE:This PEP deals only with the Python/C API, it does not affect the Python language or standard library.

Motivation

The standard function/method classesbuiltin_function_or_method andmethod_descriptorallow very efficiently calling C code. However, they are not subclassable, making them unsuitable for many applications: for example, they offer limited introspection support (signatures only using__text_signature__,no arbitrary__qualname__, noinspect.getfile()). It’s also not possible to store additional data to implement something like functools.partialorfunctools.lru_cache. So, there are many reasons why users would want to implement custom function/method classes (in a duck-typing sense) in C. Unfortunately, such custom classes are necessarily slower than the standard CPython function classes: the bytecode interpreter has various optimizations which are specific to instances of builtin_function_or_method,method_descriptor,methodandfunction.

This PEP also allows to simplify existing code: checks forbuiltin_function_or_methodandmethod_descriptor could be replaced by simply checking for and using the C call protocol. Future PEPs may implement the C call protocol for more classes, enabling even further simplifications.

We also design the C call protocol such that it can easily be extended with new features in the future.

For more background and motivation, seePEP 579.

Overview

Currently, CPython has multiple optimizations for fast calling for a few specific function classes. A good example is the implementation of the opcodeCALL_FUNCTION, which has the following structure (see the actual code):

if(PyCFunction_Check(func)){
return_PyCFunction_FastCallKeywords(func,stack,nargs,kwnames);
}
elseif(Py_TYPE(func)==&PyMethodDescr_Type){
return_PyMethodDescr_FastCallKeywords(func,stack,nargs,kwnames);
}
else{
if(PyMethod_Check(func)&&PyMethod_GET_SELF(func)!=NULL){
/*...*/
}
if(PyFunction_Check(func)){
return_PyFunction_FastCallKeywords(func,stack,nargs,kwnames);
}
else{
return_PyObject_FastCallKeywords(func,stack,nargs,kwnames);
}
}

Calling instances of these special-cased classes using thetp_callslot is slower than using the optimizations. The basic idea of this PEP is to enable such optimizations for user C code, both as caller and as callee.

The existing classbuiltin_function_or_methodand a few others use aPyMethodDefstructure for describing the underlying C function and its signature. The first concrete change is that this is replaced by a new structurePyCCallDef. This stores some of the same information as aPyMethodDef, but with one important addition: the “parent” of the function (the class or module where it is defined). Note thatPyMethodDefarrays are still used to construct functions/methods but no longer for calling them.

Second, we want that every class can use such aPyCCallDeffor optimizing calls, so thePyTypeObjectstructure gains atp_ccalloffsetfield giving an offset to aPyCCallDef*in the object structure and a flagPy_TPFLAGS_HAVE_CCALLindicating thattp_ccalloffsetis valid.

Third, since we want to deal efficiently with unbound and bound methods too (as opposed to only plain functions), we need to handle__self__in the protocol: after thePyCCallDef*in the object structure, there is aPyObject*selffield. These two fields together are referred to as aPyCCallRootstructure.

The new protocol for efficiently calling objects using these new structures is called the “C call protocol”.

NOTE:In this PEP, the phrases “unbound method” and “bound method” refer to generic behavior, not to specific classes. For example, an unbound method gets turned into a bound method after applying__get__.

New data structures

ThePyTypeObjectstructure gains a new fieldPy_ssize_ttp_ccalloffset and a new flagPy_TPFLAGS_HAVE_CCALL. If this flag is set, thentp_ccalloffsetis assumed to be a valid offset inside the object structure (similar totp_dictoffsetandtp_weaklistoffset). It must be a strictly positive integer. At that offset, aPyCCallRootstructure appears:

typedefstruct{
constPyCCallDef*cr_ccall;
PyObject*cr_self;/*__self__argumentformethods*/
}PyCCallRoot;

ThePyCCallDefstructure contains everything needed to describe how the function can be called:

typedefstruct{
uint32_tcc_flags;
PyCFunccc_func;/*Cfunctiontocall*/
PyObject*cc_parent;/*classormodule*/
}PyCCallDef;

The reason for putting__self__outside ofPyCCallDef is thatPyCCallDefis not meant to be changed after creating the function. A singlePyCCallDefcan be shared by an unbound method and multiple bound methods. This wouldn’t work if we would put__self__inside that structure.

NOTE:unliketp_dictoffsetwe do not allow negative numbers fortp_ccalloffsetto mean counting from the end. There does not seem to be a use case for it and it would only complicate the implementation.

Parent

Thecc_parentfield (accessed for example by a__parent__ or__objclass__descriptor from Python code) can be any Python object, or NULL. Custom classes are free to setcc_parentto whatever they want. It is only used by the C call protocol if the CCALL_OBJCLASSflag is set.

For methods of extension types,cc_parentpoints to the class that defines the method (which may be a superclass oftype(self)). This is currently non-trivial to retrieve from a method’s code. In the future, this can be used to access the module state via the defining class. See the rationale ofPEP 573for details.

When the flagCCALL_OBJCLASSis set (as it will be for methods of extension types),cc_parentis used for type checks like the following:

>>>list.append({},"x")
Traceback (most recent call last):
File"<stdin>",line1,in<module>
TypeError:descriptor 'append' requires a 'list' object but received a 'dict'

For functions of modules,cc_parentis set to the module. Currently, this is exactly the same as__self__. However, using__self__for the module is a quirk of the current implementation: in the future, we want to allow functions which use__self__ in the normal way, for implementing methods. Such functions can still usecc_parentinstead to refer to the module.

The parent would also typically be used to implement__qualname__. The new C API functionPyCCall_GenericGetQualname()does exactly that.

Using tp_print

We propose to replace the existing unused fieldtp_print bytp_ccalloffset. SincePy_TPFLAGS_HAVE_CCALLwouldnotbe added to Py_TPFLAGS_DEFAULT,this ensures full backwards compatibility for existing extension modules settingtp_print. It also means that we can require thattp_ccalloffsetis a valid offset whenPy_TPFLAGS_HAVE_CCALLis specified: we do not need to checktp_ccalloffset!=0. In future Python versions, we may decide thattp_print becomestp_ccalloffsetunconditionally, drop thePy_TPFLAGS_HAVE_CCALLflag and instead check for tp_ccalloffset!=0.

NOTE:the exact layout ofPyTypeObjectis not part of thestable ABI). Therefore, changing thetp_printfield from aprintfunc(a function pointer) to aPy_ssize_tshould not be a problem, even if this changes the memory layout of thePyTypeObjectstructure. Moreover, on all systems for which binaries are commonly built (Windows, Linux, macOS), the size ofprintfuncandPy_ssize_tare the same, so the issue of binary compatibility will not come up anyway.

The C call protocol

We say that a class implements the C call protocol if it has thePy_TPFLAGS_HAVE_CCALLflag set (as explained above, it must then settp_ccalloffset>0). Such a class must implement__call__as described in this section (in practice, this just means settingtp_calltoPyCCall_Call).

Thecc_funcfield is a C function pointer, which plays the same role as the existingml_methfield ofPyMethodDef. Its precise signature depends on flags. The subset of flags influencing the signature ofcc_func is given by the bitmaskCCALL_SIGNATURE. Below are the possible values forcc_flags&CCALL_SIGNATURE together with the arguments that the C function takes. The return value is alwaysPyObject*. The following are analogous to the existingPyMethodDef signature flags:

  • CCALL_VARARGS: cc_func(PyObject*self,PyObject*args)
  • CCALL_VARARGS|CCALL_KEYWORDS: cc_func(PyObject*self,PyObject*args,PyObject*kwds) (kwdsis eitherNULLor a dict; this dict must not be modified by the callee)
  • CCALL_FASTCALL: cc_func(PyObject*self,PyObject*const*args,Py_ssize_tnargs)
  • CCALL_FASTCALL|CCALL_KEYWORDS: cc_func(PyObject*self,PyObject*const*args,Py_ssize_tnargs,PyObject*kwnames) (kwnamesis eitherNULLor a non-empty tuple of keyword names)
  • CCALL_NOARGS: cc_func(PyObject*self,PyObject*unused)(second argument is alwaysNULL)
  • CCALL_O: cc_func(PyObject*self,PyObject*arg)

The flagCCALL_DEFARGmay be combined with any of these. If so, the C function takes an additional argument as first argument beforeself, namely a const pointer to thePyCCallDefstructure used for this call. For example, we have the following signature:

  • CCALL_DEFARG|CCALL_VARARGS: cc_func(constPyCCallDef*def,PyObject*self,PyObject*args)

One exception isCCALL_DEFARG|CCALL_NOARGS: theunusedargument is dropped, so the signature becomes

  • CCALL_DEFARG|CCALL_NOARGS: cc_func(constPyCCallDef*def,PyObject*self)

NOTE:unlike the existingMETH_...flags, theCCALL_...constants do not necessarily represent single bits. So checkingif(cc_flags&CCALL_VARARGS)is not a valid way for checking the signature. There are also no guarantees of binary compatibility for these flags between Python versions. This allows the implementation to choose the most efficient numerical values of the flags. In the reference implementation, the legal values forcc_flags&CCALL_SIGNATUREform exactly the interval [0,…, 11]. This means that the compiler can easily optimize aswitchstatement for those cases using a computed goto.

Checking __objclass__

If theCCALL_OBJCLASSflag is set and ifcr_selfis NULL (this is the case for unbound methods of extension types), then a type check is done: the function must be called with at least one positional argument and the first (typically calledself) must be an instance of cc_parent(which must be a class). If not, aTypeErroris raised.

Self slicing

Ifcr_selfis not NULL or if the flagCCALL_SELFARG is not set incc_flags,then the argument passed asself is simplycr_self.

Ifcr_selfis NULL and the flagCCALL_SELFARGis set, then the first positional argument is removed from argsand instead passed asselfargument to the C function. Effectively, the first positional argument is treated as__self__. If there are no positional arguments,TypeErroris raised.

This process is called “self slicing” and a function is said to have self slicing ifcr_selfis NULL andCCALL_SELFARGis set.

Note that aCCALL_NOARGSfunction with self slicing effectively has one argument, namelyself. Analogously, aCCALL_Ofunction with self slicing has two arguments.

Descriptor behavior

Classes supporting the C call protocol must implement the descriptor protocol in a specific way.

This is required for an efficient implementation of bound methods: if other code can make assumptions on what__get__does, it enables optimizations which would not be possible otherwise. In particular, we want to allow sharing thePyCCallDefstructure between bound and unbound methods. We also need a correct implementation of_PyObject_GetMethod which is used by theLOAD_METHOD/CALL_METHODoptimization.

First of all, iffuncsupports the C call protocol, thenfunc.__set__andfunc.__delete__must not be implemented.

Second,func.__get__must behave as follows:

  • Ifcr_selfis not NULL, then__get__must be a no-op in the sense thatfunc.__get__(obj,cls)(*args,**kwds) behaves exactly the same asfunc(*args,**kwds). It is also allowed for__get__to be not implemented at all.
  • Ifcr_selfis NULL, thenfunc.__get__(obj,cls)(*args,**kwds) (withobjnot None) must be equivalent tofunc(obj,*args,**kwds). In particular,__get__must be implemented in this case. This is unrelated toself slicing:objmay be passed asselfargument to the C function or it may be the first positional argument.
  • Ifcr_selfis NULL, thenfunc.__get__(None,cls)(*args,**kwds) must be equivalent tofunc(*args,**kwds).

There are no restrictions on the objectfunc.__get__(obj,cls). The latter is not required to implement the C call protocol for example. We only specify whatfunc.__get__(obj,cls).__call__does.

For classes that do not care about__self__and__get__at all, the easiest solution is to assigncr_self=Py_None (or any other non-NULL value).

The __name__ attribute

The C call protocol requires that the function has a__name__ attribute which is of typestr(not a subclass).

Furthermore, the object returned by__name__must be stored somewhere; it cannot be a temporary object. This is required becausePyEval_GetFuncName uses a borrowed reference to the__name__attribute (see also[2]).

Generic API functions

This section lists the new public API functions or macros dealing with the C call protocol.

  • intPyCCall_Check(PyObject*op): return true ifopimplements the C call protocol.

All the functions and macros below apply to any instance supporting the C call protocol. In other words,PyCCall_Check(func)must be true.

  • PyObject*PyCCall_Call(PyObject*func,PyObject*args,PyObject*kwds): callfuncwith positional argumentsargs and keyword argumentskwds(kwdsmay be NULL). This function is meant to be put in thetp_callslot.
  • PyObject*PyCCall_FastCall(PyObject*func,PyObject*const*args,Py_ssize_tnargs,PyObject*kwds): callfuncwithnargspositional arguments given byargs[0],…,args[nargs-1]. The parameterkwdscan be NULL (no keyword arguments), a dict withname:valueitems or a tuple with keyword names. In the latter case, the keyword values are stored in theargs array, starting atargs[nargs].

Macros to access thePyCCallRootandPyCCallDefstructures:

  • constPyCCallRoot*PyCCall_CCALLROOT(PyObject*func): pointer to thePyCCallRootstructure insidefunc.
  • constPyCCallDef*PyCCall_CCALLDEF(PyObject*func): shorthand forPyCCall_CCALLROOT(func)->cr_ccall.
  • uint32_tPyCCall_FLAGS(PyObject*func): shorthand forPyCCall_CCALLROOT(func)->cr_ccall->cc_flags.
  • PyObject*PyCCall_SELF(PyOject*func): shorthand forPyCCall_CCALLROOT(func)->cr_self.

Generic getters, meant to be put into thetp_getsetarray:

  • PyObject*PyCCall_GenericGetParent(PyObject*func,void*closure): returncc_parent. RaiseAttributeErrorifcc_parentis NULL.
  • PyObject*PyCCall_GenericGetQualname(PyObject*func,void*closure): return a string suitable for using as__qualname__. This uses the__qualname__ofcc_parentif possible. It also uses the__name__attribute.

Profiling

The profiling events c_call,c_returnandc_exceptionare only generated when calling actual instances ofbuiltin_function_or_methodormethod_descriptor. This is done for simplicity and also for backwards compatibility (such that the profile function does not receive objects that it does not recognize). In a future PEP, we may extend C-level profiling to arbitrary classes implementing the C call protocol.

Changes to built-in functions and methods

The reference implementation of this PEP changes the existing classesbuiltin_function_or_methodandmethod_descriptor to use the C call protocol. In fact, those two classes are almost merged: the implementation becomes very similar, but they remain separate classes (mostly for backwards compatibility). ThePyCCallDefstructure is simply stored as part of the object structure. Both classes usePyCFunctionObjectas object structure. This is the new layout for both classes:

typedefstruct{
PyObject_HEAD
PyCCallDef*m_ccall;
PyObject*m_self;/*Passedas'self'argtotheCfunction*/
PyCCallDef_ccalldef;/*Storageform_ccall*/
PyObject*m_name;/*__name__;strobject(notNULL)*/
PyObject*m_module;/*__module__;canbeanything*/
constchar*m_doc;/*__text_signature__and__doc__*/
PyObject*m_weakreflist;/*Listofweakreferences*/
}PyCFunctionObject;

For functions of a module and for unbound methods of extension types, m_ccallpoints to the_ccalldeffield. For bound methods,m_ccallpoints to thePyCCallDef of the unbound method.

NOTE:the new layout ofmethod_descriptorchanges it such that it no longer starts withPyDescr_COMMON. This is purely an implementation detail and it should cause few (if any) compatibility problems.

C API functions

The following function is added (also to thestable ABI):

  • PyObject*PyCFunction_ClsNew(PyTypeObject*cls,PyMethodDef*ml,PyObject*self,PyObject*module,PyObject*parent): create a new object with object structurePyCFunctionObjectand classcls. The entries of thePyMethodDefstructure are used to construct the new object, but the pointer to thePyMethodDefstructure is not stored. The flags for the C call protocol are automatically determined in terms ofml->ml_flags,selfandparent.

The existing functionsPyCFunction_New,PyCFunction_NewExand PyDescr_NewMethodare implemented in terms ofPyCFunction_ClsNew.

The undocumented functionsPyCFunction_GetFlags andPyCFunction_GET_FLAGSare deprecated. They are still artificially supported by storing the originalMETH_... flags in a bitfield insidecc_flags. Despite the fact thatPyCFunction_GetFlagsis technically part of thestable ABI, it is highly unlikely to be used that way: first of all, it is not even documented. Second, the flagMETH_FASTCALL is not part of the stable ABI but it is very common (because of Argument Clinic). So, if one cannot supportMETH_FASTCALL, it is hard to imagine a use case forPyCFunction_GetFlags. The fact thatPyCFunction_GET_FLAGSandPyCFunction_GetFlags are not used at all by CPython outside ofObjects/call.c further shows that these functions are not particularly useful.

Inheritance

Extension types inherit the type flagPy_TPFLAGS_HAVE_CCALL and the valuetp_ccalloffsetfrom the base class, provided that they implementtp_callandtp_descr_get the same way as the base class. Heap types never inherit the C call protocol because that would not be safe (heap types can be changed dynamically).

Performance

This PEP should not impact the performance of existing code (in the positive or negative sense). It is meant to allow efficient new code to be written, not to make existing code faster.

Here are a few pointers to thepython-devmailing list where performance improvements are discussed:

Stable ABI

The functionPyCFunction_ClsNewis added to thestable ABI.

None of the functions, structures or constants dealing with the C call protocol are added to the stable ABI.

There are two reasons for this: first of all, the most useful feature of the C call protocol is probably the METH_FASTCALLcalling convention. Given that this is not even part of the public API (see alsoPEP 579,issue 6), it would be strange to add anything else from the C call protocol to the stable ABI.

Second, we want the C call protocol to be extensible in the future. By not adding anything to the stable ABI, we are free to do that without restrictions.

Backwards compatibility

There is no difference at all for the Python interface, nor for the documented C API (in the sense that all functions remain supported with the same functionality).

The only potential breakage is with C code which accesses the internals ofPyCFunctionObjectandPyMethodDescrObject. We expect very few problems because of this.

Rationale

Why is this better than PEP 575?

One of the major complaints ofPEP 575was that is was coupling functionality (the calling and introspection protocol) with the class hierarchy: a class could only benefit from the new features if it was a subclass ofbase_function. It may be difficult for existing classes to do that because they may have other constraints on the layout of the C object structure, coming from an existing base class or implementation details. For example,functools.lru_cachecannot implementPEP 575as-is.

It also complicated the implementation precisely because changes were needed both in the implementation details and in the class hierarchy.

The current PEP does not have these problems.

Why store the function pointer in the instance?

The actual information needed for calling an object is stored in the instance (in thePyCCallDefstructure) instead of the class. This is different from thetp_callslot or earlier attempts at implementing atp_fastcallslot[1].

The main use case is built-in functions and methods. For those, the C function to be called does depend on the instance.

Note that the current protocol makes it easy to support the case where the same C function is called for all instances: just use a single staticPyCCallDefstructure for every instance.

Why CCALL_OBJCLASS?

The flagCCALL_OBJCLASSis meant to support various cases where the class of aselfargument must be checked, such as:

>>>list.append({},None)
Traceback (most recent call last):
File"<stdin>",line1,in<module>
TypeError:append() requires a 'list' object but received a 'dict'

>>>list.__len__({})
Traceback (most recent call last):
File"<stdin>",line1,in<module>
TypeError:descriptor '__len__' requires a 'list' object but received a 'dict'

>>>float.__dict__["fromhex"](list,"0xff")
Traceback (most recent call last):
File"<stdin>",line1,in<module>
TypeError:descriptor 'fromhex' for type 'float' doesn't apply to type 'list'

In the reference implementation, only the first of these uses the new code. The other examples show that these kind of checks appear in multiple places, so it makes sense to add generic support for them.

Why CCALL_SELFARG?

The flagCCALL_SELFARGand the concept of self slicing are needed to support methods: the C function should not care whether it is called as unbound method or as bound method. In both cases, there should be aselfargument and this is simply the first positional argument of an unbound method call.

For example,list.appendis aMETH_Omethod. Both the callslist.append([],42)and[].append(42)should translate to the C calllist_append([],42).

Thanks to the proposed C call protocol, we can support this in such a way that both the unbound and the bound method share aPyCCallDef structure (with theCCALL_SELFARGflag set).

So,CCALL_SELFARGhas two advantages: there is no extra layer of indirection for calling methods and constructing bound methods does not require setting up aPyCCallDefstructure.

Another minor advantage is that we could make the error messages for a wrong call signature more uniform between Python methods and built-in methods. In the following example, Python is undecided whether a method takes 1 or 2 arguments:

>>>classList(list):
...defmyappend(self,item):
...self.append(item)
>>>List().myappend(1,2)
Traceback (most recent call last):
File"<stdin>",line1,in<module>
TypeError:myappend() takes 2 positional arguments but 3 were given
>>>List().append(1,2)
Traceback (most recent call last):
File"<stdin>",line1,in<module>
TypeError:append() takes exactly one argument (2 given)

It is currently impossible forPyCFunction_Call to know the actual number of user-visible arguments since it cannot distinguish at runtime between a function (withoutselfargument) and a bound method (withselfargument). TheCCALL_SELFARGflag makes this difference explicit.

Why CCALL_DEFARG?

The flagCCALL_DEFARGgives the callee access to thePyCCallDef*. There are various use cases for this:

  1. The callee can use thecc_parentfield, which is useful forPEP 573.
  2. Applications are free to extend thePyCCallDefstructure with user-defined fields, which can then be accessed analogously.
  3. In the case where thePyCCallDefstructure is part of the object structure (this is true for example forPyCFunctionObject), an appropriate offset can be subtracted from thePyCCallDefpointer to get a pointer to the callable object defining thatPyCCallDef.

An earlier version of this PEP defined a flagCCALL_FUNCARG instead ofCCALL_DEFARGwhich would pass the callable object to the callee. This had similar use cases, but there was some ambiguity for bound methods: should the “callable object” be the bound method object or the original function wrapped by the method? By passing thePyCCallDef*instead, this ambiguity is gone since the bound method uses thePyCCallDef*from the wrapped function.

Replacing tp_print

We repurposetp_printastp_ccalloffsetbecause this makes it easier for external projects to backport the C call protocol to earlier Python versions. In particular, the Cython project has shown interest in doing that (seehttps://mail.python.org/pipermail/python-dev/2018-June/153927.html).

Alternative suggestions

PEP 576is an alternative approach to solving the same problem as this PEP. Seehttps://mail.python.org/pipermail/python-dev/2018-July/154238.html for comments on the difference betweenPEP 576andPEP 580.

Discussion

Links to threads on thepython-devmailing list where this PEP has been discussed:

Reference implementation

The reference implementation can be found at https://github.com/jdemeyer/cpython/tree/pep580

For an example of using the C call protocol, the following branch implementsfunctools.lru_cacheusingPEP 580: https://github.com/jdemeyer/cpython/tree/lru580

References


Source:https://github.com/python/peps/blob/main/peps/pep-0580.rst

Last modified:2023-09-09 17:39:29 GMT