Python String Conversion 101: Why Every Class Needs a “repr” How and why to implement Python “to string” conversion in your own classes using Python’s “repr” and “str” mechanisms and associated coding conventions.

When you define a custom class in Python and then try to print one of its instances to the console (or inspect it in an interpreter session), you get a relatively unsatisfying result. The default “to string” conversion behavior is basic and lacks detail: class Car : def __init__ ( self , color , mileage ): self . color = color self . mileage = mileage >>> my_car = Car ( 'red' , 37281 ) >>> print ( my_car ) < __console__ . Car object at 0x109b73da0 > >>> my_car < __console__ . Car object at 0x109b73da0 > By default all you get is a string containing the class name and the id of the object instance (which is the object’s memory address in CPython.) That’s better than nothing, but it’s also not very useful. You might find yourself trying to work around this by printing attributes of the class directly, or even by adding a custom to_string() method to your classes: >>> print ( my_car . color , my_car . mileage ) red 37281 The general idea here is the right one—but it ignores the conventions and built-in mechanisms Python uses to handle how objects are represented as strings.

How to Support “To String” Conversion in Your Python Classes? Instead of building your own class-to-string conversion machinery, modelled after Java’s toString() methods, you’ll be better off adding the __str__ and __repr__ “dunder” methods to your class. They are the Pythonic way to control how objects are converted to strings in different situations. You can learn more about this in the Python data model documentation. Let’s take a look at how these methods work in practice. To get started, we’re going to add a __str__ method to the Car class we defined earlier: class Car : def __init__ ( self , color , mileage ): self . color = color self . mileage = mileage def __str__ ( self ): return f 'a {self.color} car' When you try printing or inspecting a Car instance now, you’ll get a different, slightly improved result: >>> my_car = Car ( 'red' , 37281 ) >>> print ( my_car ) 'a red car' >>> my_car < __console__ . Car object at 0x109ca24e0 > Inspecting the car object in the console still gives us the previous result containing the object’s id . But printing the object resulted in the string returned by the __str__ method we added. __str__ is one of Python’s “dunder” (double-underscore) methods and gets called when you try to convert an object into a string through the various means that are available: >>> print ( my_car ) a red car >>> str ( my_car ) 'a red car' >>> '{}' . format ( my_car ) 'a red car' With a proper __str__ implementation, you won’t have to worry about printing object attributes directly or writing a separate to_string() function. It’s the Pythonic way to control string conversion. By the way, some people refer to Python’s “dunder” methods as “magic methods.” But these methods are not supposed to be magical in any way. The fact that these methods start and end in double underscores is simply a naming convention to flag them as core Python features. It also helps avoid naming collisions with your own methods and attributes. The object constructor __init__ follows the same convention, and there’s nothing magical or arcane about it. Don’t be afraid to use Python’s dunder methods—they’re meant to help you.

Python’s __repr__ vs __str__ : What Is the Difference Between Them? Now, our string conversion story doesn’t end there. Did you see how inspecting my_car in an interpreter session still gave that odd <Car object at ...> result? This happened because there are actually two dunder methods that control how objects are converted to strings in Python 3. The first one is __str__ , and you just learned about it. The second one is __repr__ , and the way it works is similar to __str__ , but it is used in different situations. (Python 2.x also has a __unicode__ method that I’ll touch on a little later.) Here’s a simple experiment you can use to get a feel for when __str__ or __repr__ is used. Let’s redefine our car class so it contains both to-string dunder methods with outputs that are easy to distinguish: class Car : def __init__ ( self , color , mileage ): self . color = color self . mileage = mileage def __repr__ ( self ): return '__repr__ for Car' def __str__ ( self ): return '__str__ for Car' Now, when you play through the previous examples you can see which method controls the string conversion result in each case: >>> my_car = Car ( 'red' , 37281 ) >>> print ( my_car ) __str__ for Car >>> '{}' . format ( my_car ) '__str__ for Car' >>> my_car __repr__ for Car This experiment confirms that inspecting an object in a Python interpreter session simply prints the result of the object’s __repr__ . Interestingly, containers like lists and dicts always use the result of __repr__ to represent the objects they contain. Even if you call str on the container itself: str ([ my_car ]) '[__repr__ for Car]' To manually choose between both string conversion methods, for example, to express your code’s intent more clearly, it’s best to use the built-in str() and repr() functions. Using them is preferable over calling the object’s __str__ or __repr__ directly, as it looks nicer and gives the same result: >>> str ( my_car ) '__str__ for Car' >>> repr ( my_car ) '__repr__ for Car' Even with this investigation complete, you might be wondering what the “real-world” difference is between __str__ and __repr__ . They both seem to serve the same purpose, so it might be unclear when to use each. With questions like that, it’s usually a good idea to look into what the Python standard library does. Time to devise another experiment. We’ll create a datetime.date object and find out how it uses __repr__ and __str__ to control string conversion: >>> import datetime >>> today = datetime . date . today () The result of the date object’s __str__ function should primarily be readable. It’s meant to return a concise textual representation for human consumption—something you’d feel comfortable displaying to a user. Therefore, we get something that looks like an ISO date format when we call str() on the date object: >>> str ( today ) '2017-02-02' With __repr__ , the idea is that its result should be, above all, unambiguous. The resulting string is intended more as a debugging aid for developers. And for that it needs to be as explicit as possible about what this object is. That’s why you’ll get a more elaborate result calling repr() on the object. It even includes the full module and class name: >>> repr ( today ) 'datetime.date(2017, 2, 2)' We could copy and paste the string returned by __repr__ and execute it as valid Python to recreate the original date object. This is a neat approach and a good goal to keep in mind while writing your own reprs. On the other hand, I find that it is quite difficult to put into practice. Usually it won’t be worth the trouble and it’ll just create extra work for you. My rule of thumb is to make my __repr__ strings unambiguous and helpful for developers, but I don’t expect them to be able to restore an object’s complete state.

Why Every Python Class Needs a __repr__ If you don’t add a __str__ method, Python falls back on the result of __repr__ when looking for __str__ . Therefore, I recommend that you always add at least a __repr__ method to your classes. This will guarantee a useful string conversion result in almost all cases, with a minimum of implementation work. Here’s how to add basic string conversion support to your classes quickly and efficiently. For our Car class we might start with the following __repr__ : def __repr__ ( self ): return f 'Car({self.color!r}, {self.mileage!r})' Please note that I’m using the !r conversion flag to make sure the output string uses repr(self.color) and repr(self.mileage) instead of str(self.color) and str(self.mileage) . This works nicely, but one downside is that we’ve repeated the class name inside the format string. A trick you can use here to avoid this repetition is to use the object’s __class__.__name__ attribute, which will always reflect the class’ name as a string. The benefit is you won’t have to modify the __repr__ implementation when the class name changes. This makes it easy to adhere to the Don’t Repeat Yourself (DRY) principle: def __repr__ ( self ): return ( f '{self.__class__.__name__}(' f '{self.color!r}, {self.mileage!r})' ) The downside of this implementation is that the format string is quite long and unwieldy. But with careful formatting, you can keep the code nice and PEP 8 compliant. With the above __repr__ implementation, we get a useful result when we inspect the object or call repr() on it directly: >>> repr ( my_car ) 'Car(red, 37281)' Printing the object or calling str() on it returns the same string because the default __str__ implementation simply calls __repr__ : >>> print ( my_car ) 'Car(red, 37281)' >>> str ( my_car ) 'Car(red, 37281)' I believe this approach provides the most value with a modest amount of implementation work. It’s also a fairly cookie-cutter approach that can be applied without much deliberation. For this reason, I always try to add a basic __repr__ implementation to my classes. Here’s a complete example for Python 3, including an optional __str__ implementation: class Car : def __init__ ( self , color , mileage ): self . color = color self . mileage = mileage def __repr__ ( self ): return ( f '{self.__class__.__name__}(' f '{self.color!r}, {self.mileage!r})' ) def __str__ ( self ): return f 'a {self.color} car'

Python 2.x Differences: __unicode__ In Python 3 there’s one data type to represent text across the board: str . It holds unicode characters and can represent most of the world’s writing systems. Python 2.x uses a different data model for strings. There are two types to represent text: str , which is limited to the ASCII character set, and unicode , which is equivalent to Python 3’s str . Due to this difference, there’s yet another dunder method in the mix for controlling string conversion in Python 2: __unicode__ . In Python 2, __str__ returns bytes, whereas __unicode__ returns characters. For most intents and purposes, __unicode__ is the newer and preferred method to control string conversion. There’s also a built-in unicode() function to go along with it. It calls the respective dunder method, similar to how str() and repr() work. So far so good. Now, it gets a little more quirky when you look at the rules for when __str__ and __unicode__ are called in Python 2: The print statement and str() call __str__ . The unicode() built-in calls __unicode__ if it exists, and otherwise falls back to __str__ and decodes the result with the system text encoding. Compared to Python 3, these special cases complicate the text conversion rules somewhat. But there is a way to simplify things again for practical purposes. Unicode is the preferred and future-proof way of handling text in your Python programs. So generally, what I would recommend you do in Python 2.x is to put all of your string formatting code inside the __unicode__ method and then create a stub __str__ implementation that returns the unicode representation encoded as UTF-8: def __str__ ( self ): return unicode ( self ) . encode ( 'utf-8' ) The __str__ stub will be the same for most classes you write, so you can just copy and paste it around as needed (or put it into a base class where it makes sense). All of your string conversion code that is meant for non-developer use then lives in __unicode__ . Here’s a complete example for Python 2.x: class Car ( object ): def __init__ ( self , color , mileage ): self . color = color self . mileage = mileage def __repr__ ( self ): return '{}({!r}, {!r})' . format ( self . __class__ . __name__ , self . color , self . mileage ) def __unicode__ ( self ): return u 'a {self.color} car' . format ( self = self ) def __str__ ( self ): return unicode ( self ) . encode ( 'utf-8' )