There are two conceptual elements to this proposed framework. The first element deals with creating MIME parts from content objects and extracting content objects from MIME parts, and the second element deals with creating a multipart message by combining MIME parts.

Note in particular that the current email package doesn’t explicitly support the video maintype, and the standard library has no video-oriented utilities. So for this type you will have to use the RawDataManager or the FileManager and do your own parameter setting (although we might consider creating a Video utility class just to allow the mimetype to get set automatically.)

Obviously each of these content managers are useful in different circumstances, quite possibly even within the same application, which is why the set_content and get_content methods of Message accept a content_manager keyword argument.

A bytes object or a file opened in binary mode will be treated as type application , and will require that the MIME subtype be passed explicitly.

For other types I will try to directly support the RFC defined parameters both here and in the FileManager . But there are so many that it won’t be practical to handle them all, so there will still be a params keyword argument to pass arbitrary additional parameters. Among the valid input types will be anything handled by the standard library that I have time to implement (eg: aifc , wave , email.message.Message ). For images, there will be a utility class you can pass a bytes object or filename to which will use imghdr to determine the image type. The resulting instance can then be passed to set_content .

For set_content , the str type uses the same signature used by the RawDataManager for the text type, except that it does not support passing in arbitrary extra parameters. (This is for the same reason MIMEText doesn’t support it: there are no defined additional parameters for text parts other than charset .)

The objects returned by this manager’s get_content will depend on whether or not the stdlib provides any suitable object. For message type objects, for example, we can return a Message object. For audio we could return an appropriate reader object for aifc and wave files. For text types we would obviously return a string. For the rest, the best we can do is to return a bytes object. However, an application is free to register additional type object methods, and the content manager functions the application registers will probably be able to take advantage of utility functions provided by the content manager module to make the resulting functions fairly straightforward to write. (This is how one could get a pillow object when calling get_content .)

This manager is closest in spirit to the original Email SIG proposal, and is possibly the one that the default policy will use. The registry maps between MIME types and specialized objects.

(One can also imagine a MailcapManager , which would actually call the appropriate mailcap-specified program when get_content is called, but that is something for an MUA author to write, not something to ship with the standard library.)

This content manager is suitable for something like a Mail User Agent, where extracting attachments to disk and reading attachments from disk are the most common operations.

Ideally the manager will set additional Content-Type parameters when it can figure out the correct values from the input data. Explicit values passed in the params dict would override these computed values.

The set_content method of this manager takes a file system path, and its get_content method returns a filename. The constructor of this content manager will optionally take a path representing a directory, which will be used as the starting point for interpreting the paths passed to set_content , and the directory in which the files returned by get_content will be located. If a directory is not specified, paths will be relative to the current working directory. set_content will use the mimetypes module to guess the appropriate mime type. get_content will use mimetypes to determine the appropriate extension for the file if the part has no name or filename MIME parameter. set_content will also accept the non-mime-type keywords supported by the RawDataManager . If filename is not specified the filename (without any leading directory path) of the path passed as the first argument is used.

RawDataManager is designed to give you the maximum amount of control while still making the API simpler to use. You should use this manager only if you need that level of control, and know what you are doing.

The get_content method returns a string if the maintype of the part is text , and a bytes object otherwise. To find out the nature of the data, you must interrogate the content type (and possibly its parameters), just as you do with the existing email API.

(Note: I’m not certain switching away from encoders is a good idea, it’s a thought experiment that will be further informed by the implementation.)

The reason for this last change is both to avoid needing to use a _ prefix for the other, more commonly used arguments, and to make it clear that these values are different from the Python keyword parameters: they are not checked for validity, they are simply passed through onto the Content-Type header. In other words, you should use this facility only when you do know what you are doing.

This is a direct replacement for the existing non- multipart constructors shown above. It adds the ability to set value of the Content-Disposition header, the filename (which is a parameter on the Content-Disposition header), a way to set the Content-ID header value, uses the name for the content transfer encoding rather than an email.encoders object, and groups the extra parameters (which for the text type includes charset ) into a single dictionary rather than allowing them to be keywords.

This manager will provide no more facilities than the current MIME classes do. The signature of its set_content method is:

There will doubtless be numerous instances or subclasses of the content manager with different registry entries, depending on the needs of particular applications. If this proposal is accepted, I envision shipping the email package with three built-in content manager subclasses: a RawDataManager , a FileManager , and an ObjectManager .

The “set” mapping maps from a Python type to a function. The type is looked for in several ways: first by identity (using the type itself as the key), then using the type’s __qualname__ , and finally using the type’s __name__ . This base content manager class’s set_content function has an additional required positional argument beyond that specified by the content manager API itself: the object whose type will be looked up in the registry. The function returned by the registry takes two positional arguments, the Message object and the object passed to the set_content method. Any additional arguments, positional or keyword, are also passed through to the function returned by the registry.

The “get” mapping maps from MIME types to a function. This function takes the Message object as its argument and returns an arbitrary value. Any additional arguments or keywords to the get_content method are passed through to it, but in most cases there will be none.

The content manager is responsible for populating a bare Message object with the data needed to encode whatever content is passed to its set_content method, and for turning the data stored in a parsed part into a useful object when its get_content method is called. How it does this is completely up to the content manager. The get and set methods are the only required part of the API. In fact, only the names of the methods and their first argument (the Message ) are part of the API: get and set methods may take an arbitrary number of additional positional and keyword arguments.

A content manager has two methods that correspond to the get_content and set_content Message methods proposed above. These methods take a message object as their first non- self argument. (That is, they are really double-dispatch methods.) When get_content and set_content are called on Message , the content_manager ‘s corresponding methods are called, passing the Message object as the first argument.

If content_manager is not specified, the default content manager specified by the Message ‘s current policy is used.

A sketch of the end-user interface is shown in the preceding section. To implement it, we introduce the concept of a “content manager”. A content manager is somewhat analogous to our header registry, in that it is a registry and it can be accessed though the current policy. Its operation is significantly different, however.

Building Multipart Messages¶

A MIME multipart message can have an arbitrarily complex structure. But conceptually we can break down (most) messages into a relatively simple structure: the message will have a “body” and one or more “attachments”. The “body” is generally one of three things: either a simple text/plain part, a simple text/html part, or a multipart/related part consisting of a text/html part and zero or more parts that are referenced from the html part. Complicating this simple picture, a message may have more than one version of the “body” of varying degrees of “richness” (plain text versus html being by far the most common).

Most email processing programs want to find the “body” first. Some will want only the simplest available text part, while others will prefer the complete data for the richest version. You might also have a processor that wanted html if it was available, but would ignore everything else in a related part if there was one.

Using the existing email API, a program generally will use the walk method to walk down the tree of parts, looking for the part of the type it is most interested in. This is such a common task that it would be nice to have a direct API for it. I propose the following method:

get_body(preferencelist=(‘related’, ‘html’, ‘text’))

preferencelist is a tuple of strings that indicates the order of preference for the part returned. If html is included in the list and related is not, then the html part of a related part would be returned if there is no separate html part. If only text is specified and there is no text part, None is returned. Likewise if only html is specified and there is no html part. Specifying related by itself is an error; the preferences string must always contain at least one of text or html . (There is an edge case: if there is no multipart/related but there are both html and text parts in a multipart/mixed , what should the behavior be? Probably the first one should be treated as the only body candidate and the other treated as an attachment, but real world data might recommend otherwise.)

Complementing get_body , I propose an iter_attachments method, which would return an iterator over of all of the parts that are not multipart/alternative , multipart/related , or the first text (or html ) part in a multipart/mixed . A non- multipart part would return an empty iterator. (Note that it is intentional that calling this on a multipart/related will return the related parts as attachments. I think this is the most useful semantic, but it is certainly open for discussion.)

A bit more tentatively, I’d also like to propose an iter_parts method that would return an iterator over all of the parts of any multipart , and return None on a non- multipart . This is equivalent to what get_payload currently returns for a multipart , but I have a (long?) term goal of deprecating get_payload .

The walk method can be still be used to walk more complicated message structures, if needed, but I suspect most programs will use get_body and iter_attachments , and then do some sort of recursion if an attachment turns out to be a multipart .

What about get_content on a multipart ? The obvious thing would be to raise an error, but...calling get_content on a mulitpart/related using the FileManager could actually be given a meaning: parsing the html using standard library tools, sanitizing it, and replacing the cid references with references to the related parts where they were placed no disk, such that if the filename returned were passed to a web browser, it could actually display the content.

I doubt that I am going to provide such a routine at this point, but I want to allow for the possibility of such a routine being written. Therefore it is the responsibility of the content manager to throw an error if it cannot satisfy a get_content call on a multipart , and the provided content managers will do so.

So that handles the “get” side of things.

For creating messages, we need to build up an example of our conceptual model message: provide a body and one or more attachments.

There is a corresponding set_content possibility for multipart/related . One could pass in a web page and have the program parse it to find the linked resources and include them as parts in the related , computing cid s as it goes. In that specific case the set_content method would be able to figure out that the part should be created as a multipart/related .

Being able to figure out the multipart subtype from the input data can only be done in that specific case, though. Otherwise we have a list of parts, and how they relate to each other cannot be known a-priori. So we need to tell set_content what the relationship is, by explicitly specifying the subtype.

Thus for creating multipart s, all of the above content managers support the following syntax:

set_content(partslist, subtype, boundary=None, params=None)

This should look kind of familiar, since it mimics the existing MIMEMultipart constructor, albeit with a slightly different parameter order. The partslist is a list of Message objects with their content already set.

To build a multipart message in this way, you do have to understand a bit about MIME message structure. You have to know that the outermost part should be a multipart/mixed , and that its first part should be a multipart/alternative and its other parts the message attachments.

Can we do better? Again, I think so.

It seems to me that a more natural way to form a message would be something like this:

>>> from email.message import MIMEMessage >>> from email.contentmanager import FileManager >>> msg = MIMEMessage () >>> msg [ 'To' ] = 'email-sig@python.org' >>> msg [ 'From] = ' rdmurray @bitdance.com ' >>> msg [ 'Subject] = ' A sample basic modern email ' >>> msg . set_content ( "My test [image1]

') >>> rel = MIMEMessage () >>> rel . set_content ( '<p>My test <img href="cid:image1"><\p>

' , ... 'html' ) >>> rel . add_related ( 'myimage.jpg' , ... cid = 'image1' , content_manager = FileManager ) >>> msg . make_alternative () >>> msg . add_alternative ( rel ) >>> msg . add_attachment ( 'spreadsheet.xml' , ... content_manager = FileManager )

The idea here is that calling add_related converts a non- multipart message into a multipart/related message, moving the original content to a new part and making it the first part in the new multipart . Similarly, make_alternative converts to a multipart/alternative , and add_attachment converts to a multiprt/mixed . Any of these methods is valid on any non- multipart part, but on multipart types only some are valid. The full matrix is:

Type Valid Methods non-multipart add_related, add_alternative, add_attachment make_related, make_alternative, make_mixed related add_related, add_alternative, add_attachment make_alternative, make_mixed alternative add_alternative, add_attachment, make_mixed mixed add_attachment

That is, you can promote from related to alternative or mixed , and from alternative to mixed , but you can only promote, not demote. This scheme seems to me to provide a natural way of building up messages from their component parts, without having to think too much about the actual MIME structure. If you get it wrong, you get an error.

I think this is reasonably elegant, but it is just a slight bit magical, so I won’t be surprised if I get some pushback on it. I think you will at least agree that it is much shorter that the same example shown earlier using the existing API.

We can can make it even shorter by using a helper class for related . We can provide a Webpage helper class whose constructor takes a string or file-like object providing the html, and a dictionary mapping content ids to objects. The content manager can construct a complete multipart/related from this object:

>>> from email.message import MIMEMessage >>> from email.contentmanager import FileManager >>> msg = MIMEMessage () >>> msg . set_content ( "My test [image1]

') >>> msg [ 'To' ] = 'email-sig@python.org' >>> msg [ 'From] = ' rdmurray @bitdance.com ' >>> msg [ 'Subject] = ' A sample basic modern email ' >>> msg . set_content ( "My test [image1]

') >>> rel = Webpage ( '<p>My test <img href="cid:image1"><\p>

' , ... dict = ( 'image1' = Image ( 'myimage.jpg' ))) >>> msg . add_alternative ( rel ) >>> msg . add_attachment ( 'spreadsheet.xml' , ... content_manager = FileManager )

In an ideal world we’d take it one step further, and have a parsing content manager that could automatically compute the text version of a related part as well:

>>> from email.message import MIMEMessage >>> from email.contentmanager import FileManager >>> msg = MIMEMessage () >>> msg [ 'To' ] = 'email-sig@python.org' >>> msg [ 'From] = ' rdmurray @bitdance.com ' >>> msg [ 'Subject] = ' A sample basic modern email ' >>> body = Webpage ( '<p>My test <img href="cid:image1"><\p>

' , ... dict = ( 'image1' = Image ( 'myimage.jp' ))) >>> msg . set_content ( body ) >>> msg . add_attachment ( 'spreadsheet.xml' , ... content_manager = FileManager )

That will need to be provided (at least initially) by a third party extension, though, since parsing and munging html into text is a non-trivial project all by itself.