Copyright © 2005-2009 Shalabh Chaturvedi
About This Book
Explains the mechanics of object attribute access for new-style Python objects:
how functions become methods
how descriptors and properties work
determining method resolution order
New-style implies Python version 2.2 and upto and including 3.x. There have been some behavioral changes during these version but all the concepts covered here are valid.
This book is part of a series:
Python Attributes and Methods [you are here]
This revision: 1.31
Discuss
| Latest version
| Cover page
Author: shalabh@cafepy.com
Table of Contents
List of Figures
List of Examples
__slots__ for optimizationSome points you should note:
This book covers the new-style objects (introduced a long time ago in Python 2.2). Examples are valid for Python 2.5 and all the way to Python 3.x.
This book is not for absolute beginners. It is for people who already know Python (some Python at least), and want to know more.
You should be familiar with the different kinds of objects in Python and not be confused when you come across the term type where you expected class. You can read the first part of this series for background information - Python Types and Objects.
Happy pythoneering!
What is an attribute? Quite simply, an attribute is a way to get from
one object to another. Apply the power of the almighty dot -
objectname.attributename - and voila! you now have
the handle to another object. You also have the power to create
attributes, by assignment: objectname.attributename =
notherobject.
Which object does an attribute access return, though? And where does the object set as an attribute end up? These questions are answered in this chapter.
Example 1.1. Simple attribute access
>>> class C(object): ... classattr = "attr on class"... >>> cobj = C() >>> cobj.instattr = "attr on instance"
>>> >>> cobj.instattr
'attr on instance' >>> cobj.classattr
'attr on class' >>> C.__dict__['classattr']
'attr on class' >>> cobj.__dict__['instattr']
'attr on instance' >>> >>> cobj.__dict__
{'instattr': 'attr on instance'} >>> C.__dict__
{'classattr': 'attr on class', '__module__': '__main__', '__doc__': None}
Attributes can be set on a class. | |
Or even on an instance of a class. | |
Both, class and instance attributes are accessible from an instance. | |
Attributes really sit inside a dictionary-like
| |
|
Ok, I admit 'user-provided attribute' is a term I made up, but I think
it is useful to understand what is going on. Note
that __dict__ is itself an attribute. We didn't set
this attribute ourselves, but Python provides it. Our old friends
__class__ and __bases__ (none
which appear to be in __dict__ either) also seem to
be similar. Let's call them Python-provided
attributes. Whether an attribute is Python-provided or not depends on
the object in question (__bases__, for example, is
Python-provided only for classes).
We, however, are more interested in user-defined
attributes. These are attributes provided by the user, and they
usually (but not always) end up in the __dict__ of
the object on which they're set.
When accessed (for e.g. print
objectname.attributename), the following objects are
searched in sequence for the attribute:
The object itself
(objectname.__dict__ or any
Python-provided attribute of
objectname).
The object's type
(objectname.__class__.__dict__). Observe that only
__dict__ is searched, which means only
user-provided attributes of the class. In other
words objectname.__bases__ may not return anything
even though objectname.__class__.__bases__ does
exist.
The bases of the object's class, their bases, and so
on. (__dict__ of each of
objectname.__class__.__bases__). More than one base
does not confuse Python, and should not concern us at the moment. The
point to note is that all bases are searched until an attribute is
found.
If all this hunting around fails to find a suitably named attribute,
Python raises an AttributeError. The type of the
type (objectname.__class__.__class__) is never
searched for attribute access on an object
(objectname in the example).
The built-in dir() function returns a list of
all attributes of an object. Also look at
the inspect
module in the standard library for more functions to inspect
objects.
The above section explains the general mechanism for
all objects. Even for classes (for example
accessing classname.attrname), with a slight
modification: the bases of the class are searched before the
class of the class (which is
classname.__class__ and for most types, by the
way, is <type 'type'>).
Some objects, such as built-in types and their instances (lists,
tuples, etc.) do not have a __dict__. Consequently
user-defined attributes cannot be set on them.
We're not done yet! This was the short version of the story. There is more to what can happen when setting and getting attributes. This is explored in the following sections.
Continuing our Python experiments:
Example 1.2. A function is more
>>> class C(object): ... classattr = "attr on class" ... def f(self): ... return "function f" ... >>> C.__dict__{'classattr': 'attr on class', '__module__': '__main__', '__doc__': None, 'f': <function f at 0x008F6B70>} >>> cobj = C() >>> cobj.classattr is C.__dict__['classattr']
True >>> cobj.f is C.__dict__['f']
False >>> cobj.f
<bound method C.f of <__main__.C instance at 0x008F9850>> >>> C.__dict__['f'].__get__(cobj, C)
<bound method C.f of <__main__.C instance at 0x008F9850>>
Two innocent looking class attributes, a string 'classattr' and a function 'f'. | |
Accessing the string really gets it from the class's | |
Not so for the function! Why? | |
Hmm, it does look like a different object. (A bound method is a
callable object that calls a function ( | |
Here's the spoiler - this is what Python did to create the bound
method. While looking for an attribute for an instance, if Python
finds an object with a |
It is only the presence of the __get__() method
that transforms an ordinary function into a bound
method. There is nothing really special about a function
object. Anyone can put objects with a __get__()
method inside the class __dict__ and get away with
it. Such objects are called descriptors and have
many uses.
Any object with a __get__() method, and optionally
__set__() and __delete__()
methods, accepting specific parameters is said to follow the
descriptor protocol. Such an object qualifies as
a descriptor and can be placed inside a class's
__dict__ to do something special when an attribute
is retrieved, set or deleted. An empty descriptor is shown below.
Called when attribute is read (eg. | |
Called when attribute is set on an instance
(eg. | |
Called when attribute is deleted from an instance
(eg. |
What we defined above is a class that can be instantiated to create a descriptor. Let's see how we can create a descriptor, attach it to a class and put it to work.
Note that when accessed from the class itself, only the
__get__() method comes in the picture, setting or
deleting the attribute will actually replace or remove the
descriptor.
Descriptors work only when attached to classes. Sticking a descriptor in an object that is not a class gives us nothing.
In the previous section we used a descriptor with both
__get__() and __set__()
methods. Such descriptors, by the way, are called data
descriptors. Descriptors with only the
__get__() method are somewhat weaker than their
cousins, and called non-data descriptors.
Repeating our experiment, but this time with non-data descriptors, we get:
Calls | |
Puts
| |
Surprise!
This now returns | |
Deletes the
attribute | |
These function identical to a data descriptor. |
Interestingly, not having a __set__() affects not
just attribute setting, but also retrieval. What is Python thinking?
If on setting, the descriptor gets fired and puts the data somewhere,
then it follows that the descriptor only knows how to get it back. Why
even bother with the instance's __dict__?
Data descriptors are useful for providing full control over an attribute. This is what one usually wants for attributes used to store some piece of data. For example an attribute that gets transformed and saved somewhere on setting, would usually be reverse-transformed and returned when read. When you have a data descriptor, it controls all access (both read and write) to the attribute on an instance. Of course, you could still directly go to the class and replace the descriptor, but you can't do that from an instance of the class.
Non-data descriptors, in contrast, only provide a value when an instance itself does not have a value. So setting the attribute on an instance hides the descriptor. This is particularly useful in the case of functions (which are non-data descriptors) as it allows one to hide a function defined in the class by attaching one to an instance.
Calls the bound
method returned by | |
Calls
|
This is the long version of the attribute access story, included just for the sake of completeness.
When retrieving an attribute from
an object (print objectname.attrname) Python follows
these steps:
If attrname is a special
(i.e. Python-provided) attribute for objectname,
return it.
Check
objectname.__class__.__dict__ for
attrname. If it exists and is
a data-descriptor, return the descriptor result. Search all bases of
objectname.__class__ for the same case.
Check objectname.__dict__ for
attrname, and return if found. If
objectname is a class, search its bases too. If it
is a class and a descriptor exists in it or its bases, return the
descriptor result.
Check objectname.__class__.__dict__
for attrname. If it exists and is a
non-data descriptor, return the descriptor result. If it
exists, and is not a descriptor, just return it. If it exists and is a
data descriptor, we shouldn't be here because we would have returned
at point 2. Search all bases of
objectname.__class__ for same
case.
Raise AttributeError
Note that Python first checks for a data
descriptor in the class (and its bases), then for the attribute in the
object __dict__, and then for a
non-data descriptor in the class (and its
bases). These are points 2, 3 and 4 above.
The descriptor result above implies
the result of calling the __get__() method of the
descriptor with appropriate arguments. Also, checking a
__dict__ for attrname means
checking if __dict__["attrname"] exists.
Now, the steps Python follows when setting
a user-defined attribute (objectname.attrname =
something):
Check
objectname.__class__.__dict__ for
attrname. If it exists and is
a data-descriptor, use the descriptor to set the value. Search all bases of
objectname.__class__ for the same case.
Insert something into
objectname.__dict__ for key
"attrname".
Think "Wow, this was much simpler!"
What happens when setting a Python-provided attribute depends on the attribute. Python may not even allow some attributes to be set. Deletion of attributes is very similar to setting as above.
Before you rush to the mall and get yourself some expensive descriptors, note that Python ships with some very useful ones that can be found by simply looking in the box.
Example 1.7. Built-in descriptors
class HidesA(object):
def get_a(self):
return self.b - 1
def set_a(self, val):
self.b = val + 1
def del_a(self):
del self.b
a = property(get_a, set_a, del_a, "docstring")
def cls_method(cls):
return "You called class %s" % cls
clsMethod = classmethod(cls_method)
def stc_method():
return "Unbindable!"
stcMethod = staticmethod(stc_method)
A
property provides an easy way to call functions
whenever an attribute is retrieved, set or deleted on the
instance. When the attribute is retrieved from the class, the getter
method is not called but the property object itself is returned. A
docstring can also be provided which is accessible as
| |
A
classmethod is similar to a regular method,
except that is passes the class (and not the instance) as the first
argument to the function. The remaining arguments are passed through
as usual. It can also be called directly on the class and it behaves
the same way. The first argument is named | |
A staticmethod is just like a function outside the class. It is never bound, which means no matter how you access it (on the class or on an instance), it gets called with exactly the same arguments you pass. No object is inserted as the first argument. |
As we saw earlier, Python functions are descriptors too. They weren't descriptors in earlier versions of Python (as there were no descriptors at all), but now they fit nicely into a more generic mechanism.
A property is always a data-descriptor, but not all arguments are required when defining it.
Can be set, retrieved, or deleted. | |
Attempting to
delete this attribute from an instance will raise
| |
Attempting to
set or delete this attribute from an instance will raise
|
The getter and setter functions need not be defined in the class itself, any function can be used. In any case, the functions will be called with the instance as the first argument. Note that where the functions are passed to the property constructor above, they are not bound functions anyway.
Another useful observation would be to note that subclassing the
class and redefining the getter (or setter) functions is not going to
change the property. The property object is holding
on to the actual functions provided. When kicked, it is
going to say "Hey, I'm holding this function I was given, I'll just
call this and return the result.", and not "Hmm, let me look up the
current class for a method called 'get_a' and
then use that". If that is what one wants, then defining a new
descriptor would be useful. How would it work? Let's say it is
initialized with a string (i.e. the name of the method to call). On
activation, it does a getattr() for the method name
on the class, and use the method found. Simple!
Classmethods and staticmethods are non-data descriptors, and so can be
hidden if an attribute with the same name is set
directly on the instance. If you are rolling your own descriptor (and
not using properties), it can be made read-only by giving it a
__set__() method but raising
AttributeError in the method. This is how a
property behaves when it does not have a setter function.
Why do we need Method Resolution Order? Let's say:
We're happily doing OO programming and building a class hierarchy.
Our usual technique to implement the
do_your_stuff() method is to first call
do_your_stuff() on the base class, and then do
our stuff.
Example 2.1. Usual base call technique
class A(object):
def do_your_stuff(self):
# do stuff with self for A
return
class B(A):
def do_your_stuff(self):
A.do_your_stuff(self)
# do stuff with self for B
return
class C(A):
def do_your_stuff(self):
A.do_your_stuff(self)
# do stuff with self for C
return
We subclass a new class from two classes and end up having the same superclass being accessible through two paths.
Example 2.2. Base call technique fails
class D(B,C):
def do_your_stuff(self):
B.do_your_stuff(self)
C.do_your_stuff(self)
# do stuff with self for D
return
Now we're stuck if we want to implement
do_your_stuff(). Using our usual technique, if we
want to call both B and C, we
end up calling A.do_your_stuff() twice. And we all
know it might be dangerous to have A do its stuff
twice, when it is only supposed to be done once. The other option
would leave either B's stuff or
C's stuff not done, which is not what we want
either.
There are messy solutions to this problem, and clean ones. Python, obviously, implements a clean one which is explained in the next section.
Let's say:
For each class, we arrange all
superclasses into an ordered list without repetitions, and insert the
class itself at the start of the list. We put this list in an class
attribute called next_class_list for our use
later.
Example 2.3. Making a "Who's Next" list
B.next_class_list = [B,A] C.next_class_list = [C,A] D.next_class_list = [D,B,C,A]
We use a different technique to implement
do_your_stuff() for our classes.
Example 2.4. Call next method technique
class B(A):
def do_your_stuff(self):
next_class = self.find_out_whos_next()
next_class.do_your_stuff(self)
# do stuff with self for B
def find_out_whos_next(self):
l = self.next_class_list # l depends on the actual instance
mypos = l.index(B)
# Find this class in the list
return l[mypos+1] # Return the next one
The interesting part is how we
find_out_whos_next(), which depends on which
instance we are working with. Note that:
Depending on whether we passed an instance of
D or of B, next_class above will resolve to either
C or A.
We have to implement
find_out_whos_next() for each class, since it has
to have the class name hardcoded in it (see
above). We cannot use
self.__class__ here. If we have called
do_your_stuff() on an instance of
D, and the call is traversing up the hierarchy,
then self.__class__ will be D
here.
Using this technique, each method is called only once. It
appears clean, but seems to require too much work. Fortunately for us,
we neither have to implement find_out_whos_next()
for each class, nor set the next_class_list, as
Python does both of these things.
Python provides a class attribute __mro__ for
each class, and a type called super. The
__mro__ attribute is a tuple containing the class
itself and all of its superclasses without duplicates in a predictable
order. A super object is used in place of the
find_out_whos_next() method.
If we're using a class method, we don't have an
instance self to pass into the
super call. Fortunately for us,
super works even with a class as the second
argument. Observe that above, super uses
self only to get at
self.__class__.__mro__. The class can be passed
directly to
super as shown below.
Example 2.6. Using super with a class method
class A(object):
@classmethod
def say_hello(cls):
print 'A says hello'
class B(A):
@classmethod
def say_hello(cls):
super(B, cls).say_hello()
print 'B says hello'
class C(A):
@classmethod
def say_hello(cls):
super(C, cls).say_hello()
print 'C says hello'
class D(B, C):
@classmethod
def say_hello(cls):
super(D, cls).say_hello()
print 'D says hello'
B.say_hello()
D.say_hello()
This example is for classmethods (not instance methods). | |
Note we pass | |
This prints out: A says hello
| |
This prints out (observe each method is called only once): A says hello
|
There is yet another way to use super:
When created with only a type, the super instance
behaves like a descriptor. This means (if d is an
instance of D) that
super(B).__get__(d) returns the same thing as
super(B,d). In
above, we munge an
attribute name, similar to what Python does for names starting with
double underscore inside the class. So this is
accessible as self.__super within the body of the
class. If we didn't use a class specific attribute name, accessing the
attribute through the instance self might return an
object defined in a subclass.
While using super we typically use only one
super call in one method even if the class has
multiple bases. Also, it is a good programming practice to use
super instead of calling methods directly on a base
class.
A possible pitfall appears if do_your_stuff()
accepts different arguments for C and
A. This is because, if we use
super in B to call
do_your_stuff() on the next
class, we don't know if it is going to be called on
A or C. If this scenario is
unavoidable, a case specific solution might be required.
One question as of yet unanswered is how does Python determine
the __mro__ for a type? A basic idea behind the
algorithm is provided in this section. This is not essential for just
using super, or reading following sections, so you
can jump to the next section if you want.
Python determines the precedence of types
(or the order in which they should be placed in any
__mro__) from two kinds of constraints specified by
the user:
If A is a superclass of
B, then B has precedence over
A. Or, B should always appear
before A in all
__mro__s (that contain both). In short let's denote
this as B > A.
If C appears before
D in the list of bases in a class statement
(eg. class Z(C,D):), then C > D.
In addition, to avoid being ambiguous, Python adheres to the following principle:
If E > F in one scenario (or one
__mro__), then it should be that E >
F in all scenarios (or all
__mro__s).
We can satisfy the constraints if we build the
__mro__ for each new class C we
introduce, such that:
All superclasses of
C appear in the C.__mro__ (plus
C itself, at the start), and
The precedence of types in
C.__mro__ does not conflict with the precedence of
types in B.__mro__ for each B in
C.__bases__.
Here the same problem is translated into a game. Consider a class hierarchy as follows:
Since only single inheritance is in play, it is easy to find the
__mro__ of these classes. Let's say we define a new
class as class N(A,B,C). To compute the
__mro__, consider a game using abacus style beads
over a series of strings.
Beads can move freely over the strings, but the strings cannot be cut
or twisted. The strings from left to right contain beads in the order
of __mro__ of each of the bases. The rightmost
string contains one bead for each base, in the order the bases are
specified in the class statement.
The objective is to line up beads in rows, so that each row contains
beads with only one label (as done with the O bead
in the diagram). Each string represents an ordering constraint, and if
we can reach the goal, we would have an order that satisfies all
constraints. We could then just read the labels off rows from the
bottom up to get the __mro__ for
N.
Unfortunately, we cannot solve this problem. The last two strings have
C and B in different
orders. However, if we change our class definition to class
N(A,C,B), then we have some hope.
We just found out that N.__mro__ is
(N,A,C,B,object) (note we inserted
N at the head). The reader can try out this
experiment in real Python (for the unsolvable case above, Python
raises an exception). Observe that we even swapped the position of two
strings, keeping the strings in the same order as the bases are
specified in the class statement. The usefulness of this is seen
later.
Sometimes, there might be more than one solution, as shown in the following figure.
An alternative position for A and
C is shown in grey. The order can be kept
unambiguous (more correctly, monotonic) if the
following policies are followed:
Arrange strings from left to right in order of appearance of bases in the class statement.
Attempt to arrange beads in rows moving from bottom
up, and left to right. What this means is that the solution in color
in the above diagram will be selected (because A,
being left of C, will be selected first as a
candidate for the second row from bottom).
This, essentially, is the idea behind the algorithm used by
Python to generate the __mro__ for any new
type. The formal algorithm is formally explained elsewhere [mro-algorithm].
This chapter includes usage notes that do not fit in other chapters.
In Python, we can use methods with special name like
__len__(), __str__() and
__add__() to make objects convenient to use (for
example, with the built-in functions len(),
str() or with the '+' operator,
etc.)
Usually we put the special methods in a class. | |
We can try to put them in the instance itself, but it doesn't work. | |
This goes
straight to the class (calls |
The same is true for all such methods, putting them on the instance we
want to use them with does not work. If it did go to the instance then
even something like str(C) (str
of the class C) would go to
C.__str__(), which is a method defined for an
instance of C, and not
C itself.
A simple technique to allow defining such methods for each instance separately is shown below.
Subclassing built-in types is straightforward. Actually we have been
doing it all along (whenever we subclass <type 'object'>). Some built-in
types (types.FunctionType, for example) are not
subclassable (not yet, at least). However, here we talk about
subclassing <type 'list'>, <type
'tuple'> and other basic data types.
A regular class statement. | |
Define the method to
be overridden. In this case we will convert all items passed through
| |
Upcall to the base if required. | |
Append a float and... | |
watch it automatically become an integer. | |
Otherwise, it behaves like any other list. | |
This doesn't go
through append. We would have to define
| |
We can set attributes
on our instance. This is because it has a |
Basic lists do not have __dict__ (and so no
user-defined attributes), but ours does. This is usually not a problem
and may even be what we want. If we use a very
large number of MyLists, however, we could optimize
our program by telling Python not to create the
__dict__ for instances of
MyList.
Example 3.4. Using __slots__ for optimization
class MyList(list):
"A list subclass disallowing any user-defined attributes"
__slots__ = []
ml = MyList()
ml.color = 'red' # raises exception!
class MyListWithFewAttrs(list):
"A list subclass allowing specific user-defined attributes"
__slots__ = ['color']
mla = MyListWithFewAttrs()
mla.color = 'red'
mla.weight = 50 # raises exception!
The
| |
Setting any attribute on this raises an exception. | |
| |
Now, if an attribute has space reserved, it can be used. | |
Otherwise, it cannot. This will raise an exception. |
The purpose and recommended use of __slots__
is for optimization. After a type is defined, its slots cannot be
changed. Also, every subclass must define __slots__,
otherwise its instances will end up having __dict__.
We can create a list even by instantiating it like any other type:
list([1,2,3]). This means
list.__init__() accepts the same argument (i.e. any
iterable) and initializes a list. We can customize initialization in a
subclass by redefining __init__() and
upcalling __init__() on the
base.
Tuples are immutable and different from lists. Once an instance
is created, it cannot be changed. Note that the instance of a type
already exists when __init__() is called (in fact
the instance is passed as the first argument). The
__new__() static method of a type is called to
create an instance of the type. It is passed the
type itself as the first argument, and passed through other initial
arguments (similar to __init__()). We use this to
customize immutable types like a tuple.
For a list, we massage the arguments and hand them over to list.__init__().
| |
For a tuple, we have to override __new__().
| |
A __new__() should always return. It is supposed to return an instance of the type.
|
The __new__() method is not special to immutable
types, it is used for all types. It is also converted to a static
method automatically by Python (by virtue of its name).
[descrintro] Unifying types and classes in Python 2.2.
[pep-252] Making Types Look More Like Classes.
[pep-253] Subclassing Built-in Types.
[descriptors-howto] How-To Guide for Descriptors.
[mro-algorithm] The Python 2.3 Method Resolution Order.
This book was written in DocBook XML. The
HTML version was produced using DocBook XSL stylesheets and
xsltproc. The PDF version was produced using
htmldoc. The diagrams were drawn using OmniGraffe
[1]. The
process was automated using
Paver [2].