In Python, it’s all about the attributes

Newcomers to Python are often amazed to discover how easily we can create new classes. For example:

class Foo(object): pass

is a perfectly valid (if boring) class. We can even create instances of this class:

f = Foo()

This is a perfectly valid instance of Foo. Indeed, if we ask it to identify itself, we’ll find that it’s an instance of Foo:

>>> type(f) <class '__main__.Foo'>

Get the bonus content: Python Attributes: Download the code! Click here

Now, while “f” might be a perfectly valid and reasonable instance of Foo, it’s not very useful. It’s at this point that many people who have come to Python from another language expect to learn where they can define instance variables. They’re relieved to know that they can write an __init__ method, which is invoked on a new object immediately after its creation. For example:

class Foo(object): def __init__(self, x, y): self.x = x self.y = y >> f = Foo(100, 'abc') >>> f.x 100 >>> f.y 'abc'

On the surface, it might seem like we’re setting two instance variables, x and y, on f, our new instance of Foo. And indeed, the behavior is something like that, and many Python programmers think in these terms. But that’s not really the case, and the sooner that Python programmers stop thinking in terms of “instance variables” and “class variables,” the sooner they’ll understand how much of Python works, why objects work in the ways that they do, and how “instance variables” and “class variables” are specific cases of a more generalized system that exists throughout Python.

The bottom line is inside of __init__, we’re adding new attributes to self, the local reference to our newly created object. Attributes are a fundamental part of objects in Python. Heck, attributes are fundamental to everything in Python. The sooner you understand what attributes are, and how they work, the sooner you’ll have a deeper understanding of Python.

Every object in Python has attributes. You can get a list of those attributes using the built-in “dir” function. For example:

>>> s = 'abc' >>> len(dir(s)) 71 >>> dir(s)[:5] ['__add__', '__class__', '__contains__', '__delattr__', '__doc__'] >>> i = 123 >>> len(dir(i)) 64 >>> dir(i)[:5] ['__abs__', '__add__', '__and__', '__class__', '__cmp__'] >>> t = (1,2,3) >>> len(dir(t)) 32 >>> dir(t)[:5] ['__add__', '__class__', '__contains__', '__delattr__', '__doc__']

As you can see, even the basic data types in Python have a large number of attributes. We can see the first five attributes by limiting the output from “dir”; you can look at them yourself inside of your Python environment.

The thing is, these attribute names returned by “dir” are strings. How can I use this string to get or set the value of an attribute? We somehow need a way to translate between the world of strings and the world of attribute names.

Fortunately, Python provides us with several built-in functions that do just that. The “getattr” function lets us get the value of an attribute. We pass “getattr” two arguments: The object whose attribute we wish to read, and the name of the attribute, as a string:

>>> getattr(t, '__class__') tuple

This is equivalent to:

>>> t.__class__ tuple

In other words, the dot notation that we use in Python all of the time is nothing more than syntactic sugar for “getattr”. Each has its uses; dot notation is far easier to read, and “getattr” gives us the flexibility to retrieve an attribute value with a dynamically built string.

Python also provides us with “setattr”, a function that takes three arguments: An object, a string indicating the name of the attribute, and the new value of the attribute. There is no difference between “setattr” and using the dot-notation on the left side of the = assignment operator:

>>> f = Foo() >>> setattr(f, 'x', 5) >>> getattr(f, 'x') 5 >>> f.x 5 >>> f.x = 100 >>> f.x 100

As with all assignments in Python, the new value can be any legitimate Python object. In this case, we’ve assigned f.x to be 5 and 100, both integers, but there’s no reason why we couldn’t assign a tuple, dictionary, file, or even a more complex object. From Python’s perspective, it really doesn’t matter.

In the above case, I used “setattr” and the dot notation (f.x) to assign a new value to the “x” attribute. f.x already existed, because it was set in __init__. But what if I were to assign an attribute that didn’t already exist?

The answer: It would work just fine:

>>> f.new_attrib = 'hello' >>> f.new_attrib 'hello' >>> f.favorite_number = 72 >>> f.favorite_number 72

In other words, we can create and assign a new attribute value by … well, by assigning to it, just as we can create a new variable by assigning to it. (There are some exceptions to this rule, mainly in that you cannot add new attributes to many built-in classes.) Python is much less forgiving if we try to retrieve an attribute that doesn’t exist:

>>> f.no_such_attribute AttributeError: 'Foo' object has no attribute 'no_such_attribute'

So, we’ve now seen that every Python object has attributes, that we can retrieve existing attributes using dot notation or “getattr”, and that we can always set attribute values. If the attribute didn’t exist before our assignment, then it certainly exists afterwards.

We can assign new attributes to nearly any object in Python. For example:

def hello(): return "Hello" >>> hello.abc_def = 'hi there!' >>> hello.abc_def 'hi there!'

Yes, Python functions are objects. And because they’re objects, they have attributes. And because they’re objects, we can assign new attributes to them, as well as retrieve the values of those attributes.

So the first thing to understand about these “instance variables” that we oh-so-casually create in our __init__ methods is that we’re not creating variables at all. Rather, we’re adding one or more additional attributes to the particular object (i.e., instance) that has been passed to __init__. From Python’s perspective, there is no difference between saying “self.x = 5” inside of __init__, or “f.x = 5” outside of __init__. We can add new attributes whenever we want, and the fact that we do so inside of __init__ is convenient, and makes our code easier to read.

This is one of those conventions that is really useful to follow: Yes, you can create and assign object attributes wherever you want. But it makes life so much easier for everyone if you assign all of an object’s attributes in __init__, even if it’s just to give it a default value, or even None. Just because you can create an attribute whenever you want doesn’t mean that you should do such a thing.

Now that you know every object has attributes, it’s time to consider the fact that classes (i.e., user-defined types) also have attributes. Indeed, we can see this:

>>> class Foo(object): pass

Can we assign an attribute to a class? Sure we can:

>>> Foo.bar = 100 >>> Foo.bar 100

Classes are objects, and thus classes have attributes. But it seems a bit annoying and roundabout for us to define attributes on our class in this way. We can define attributes on each individual instance inside of __init__. When is our class defined, and how can we stick attribute assignments in there?

The answer is easier than you might imagine. That’s because there is a fundamental difference between the body of a function definition (i.e., the block under a “def” statement) and the body of a class definition (i.e., the block under a “class” statement). A function’s body is only executed when we invoke the function. However, a the body of the class definition is executed immediately, and only once — when we define the class. We can execute code in our class definitions:

class Foo(object): print("Hello from inside of the class!")

Of course, you should never do this, but this is a byproduct of the fact that class definitions execute immediately. What if we put a variable assignment in the class definition?

class Foo(object): x = 100

If we assign a variable inside of the class definition, it turns out that we’re not assigning a variable at all. Rather, we’re creating (and then assigning to) an attribute. The attribute is on the class object. So immediately after executing the above, I can say:

Foo.x

and I’ll get the integer 100 returned back to me.

Are you a little surprised to discover that variable assignments inside of the class definition turn into attribute assignments on the class object? Many people are. They’re even more surprised, however, when they think a bit more deeply about what it must mean to have a function (or “method”) definition inside of the class:

>>> class Foo(object): def blah(self): return "blah" >>> Foo.blah <unbound method Foo.blah>

Think about it this way: If I define a new function with “def”, I’m defining a new variable in the current scope (usually the global scope). But if I define a new function with “def” inside of a class definition, then I’m really defining a new attribute with that name on the class.

In other words: Instance methods sit on a class in Python, not on an instance. When you invoke “f.blah()” on an instance of Foo, Python is actually invoking the “blah” method on Foo, and passing f as the first argument. Which is why it’s important that Python programmers understand that there is no difference between “f.blah()” and “Foo.blah(f)”, and that this is why we need to catch the object with “self”.

But wait a second: If I invoke “f.blah()”, then how does Python know to invoke “Foo.blah”? f and Foo are two completely different objects; f is an instance of Foo, whereas Foo is an instance of type. Why is Python even looking for the “blah” attribute on Foo?

The answer is that Python has different rules for variable and attribute scoping. With variables, Python follows the LEGB rule: Local, Enclosing, Global, and Builtin. (See my free, five-part e-mail course on Python scopes, if you aren’t familiar with them.) But with attributes, Python follows a different set of rules: First, it looks on the object in question. Then, it looks on the object’s class. Then it follows the inheritance chain up from the object’s class, until it hits “object” at the top.

Thus, in our case, we invoke “f.blah()”. Python looks on the instance f, and doesn’t find an attribute named “blah”. Thus, it looks on f’s class, Foo. It finds the attribute there, and performs some Python method rewriting magic, thus invoking “Foo.blah(f)”.

So Python doesn’t really have “instance variables” or “class variables.” Rather, it has objects with attributes. Some of those attributes are defined on class objects, and others are defined on instance objects. (Of course, class objects are just instances of “type”, but let’s ignore that for now.) This also explains why people sometimes think that they can or should define attributes on a class (“class variables”), because they’re visible to the instances. Yes, that is true, but it sometimes makes more sense than others to do so.

What you really want to avoid is creating an attribute on the instance that has the same name as an attribute on the class. For example, imagine this:

class Person(object): population = 0 def __init__(self, first, last): self.first = first self.last = last self.population += 1 p1 = Person('Reuven', 'Lerner') p2 = Person('foo', 'bar')

This looks all nice, until you actually try to run it. You’ll quickly discover that Person.population remains stuck at 0, but p1.population and p2.population are both set to 1. What’s going on here?

The answer is that the line

self.population += 1

can be turned into

self.population = self.population + 1

As always, the right side of an assignment is evaluated before the left side. Thus, on the right side, we say “self.population”. Python looks at the instance, self, and looks for an attribute named “population”. No such attribute exists. It thus goes to Person, self’s class, and does find an attribute by that name, with a value of 0. It thus returns 0, and executes 0 + 1. That gives us the answer 1, which is then passed to the left side of the assignment. The left side says that we should store this result in self.population — in other words, an attribute on the instance! This works, because we can always assign any attribute. But in this case, we will now get different results for Person.population (which will remain at 0) and the individual instance values of population, such as p1 and p2.

We can actually see what attributes were actually set on the instance and on the class, using a list comprehension:

class Foo(object): def blah(self): return "blah" >>> [attr_name for attr_name in dir(f) if attr_name not in dir(Foo)] [] >>> [attr_name for attr_name in dir(Foo) if attr_name not in dir(object)] ['__dict__', '__module__', '__weakref__', 'blah']

In the above, we first define “Foo”, with a method “blah”. That method definition, as we know, is stored in the “blah” attribute on Foo. We haven’t assigned any attributes to f, which means that the only attributes available to f are those in its class.