Overview
This article explains how to extract and save attachments from an EmailMessage object (retrieved via Python’s email module) to a local disk. We will cover handling binary files like images and PDFs, as well as correctly re-encoding text attachments that are automatically decoded by the library.
Specifications (Input/Output)
- Input:
email.message.EmailMessageobject (containing attachments). - Output: Files saved to the disk.
- Requirement: The email object must be parsed using
policy.default.
Basic Usage
This is a basic implementation that loops through only the attachment parts of the email object, retrieves the filename and content, and saves them.
# Assuming 'msg' is an EmailMessage object
for part in msg.iter_attachments():
# Get the filename
filename = part.get_filename()
if filename:
# Retrieve content (may return bytes for images or str for text)
content = part.get_content()
# Save in binary mode
with open(filename, "wb") as f:
if isinstance(content, str):
# If it's a string, encode it back to bytes before writing
charset = part.get_content_charset() or "utf-8"
f.write(content.encode(charset))
else:
# If it's binary, write it directly
f.write(content)
Full Code
This is a complete simulation code that creates a dummy “email with attachments,” parses it, and saves the files to a directory.
import os
from email import message_from_bytes, policy
from email.message import EmailMessage
def extract_attachments_demo():
"""
Demo function to extract and save attachments from an EmailMessage object.
"""
# --- 1. Prepare Test Data (Simulating a received email with attachments) ---
raw_msg = EmailMessage()
raw_msg["Subject"] = "Test Email"
raw_msg.set_content("Please check the attachments.")
# Dummy attachment 1: Text file
raw_msg.add_attachment(
"This is an attached text file.".encode("utf-8"),
maintype="text", subtype="plain", filename="note.txt"
)
# Dummy attachment 2: Binary file (PNG image data simulation)
raw_msg.add_attachment(
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR...',
maintype="image", subtype="png", filename="image.png"
)
# Convert to bytes (simulating raw data from an IMAP server)
email_bytes = raw_msg.as_bytes()
# --- 2. Implementation of Extraction ---
# Convert bytes back to an EmailMessage object
msg = message_from_bytes(email_bytes, policy=policy.default)
print(f"Subject: {msg['Subject']}")
# Directory to save files
save_dir = "downloaded_attachments"
os.makedirs(save_dir, exist_ok=True)
# Loop through attachment parts using iter_attachments
for part in msg.iter_attachments():
filename = part.get_filename()
if not filename:
print("Skipping a part with no filename.")
continue
print(f"Found attachment: {filename}")
# get_content() automatically converts data based on Content-Type
# Returns bytes for images/PDFs or str for text
content = part.get_content()
save_path = os.path.join(save_dir, filename)
try:
with open(save_path, "wb") as f:
if isinstance(content, str):
# If returned as str, encode it back to the original charset
charset = part.get_content_charset() or 'utf-8'
print(f" - Encoding text data as {charset} and saving...")
f.write(content.encode(charset))
else:
# If bytes, write directly
print(" - Saving binary data...")
f.write(content)
print(f" -> Saved to: {save_path}")
except Exception as e:
print(f" -> Error saving file: {e}")
if __name__ == "__main__":
extract_attachments_demo()
Customization Points
These are the main methods used for handling attachments in EmailMessage objects.
| Method / Syntax | Description | Return Type |
msg.iter_attachments() | Returns an iterator to access only the parts treated as attachments. | generator (EmailMessage parts) |
part.get_filename() | Retrieves the filename of the part. Decoded automatically. | str or None |
part.get_content() | Retrieves the payload. Returns str for text and bytes for others if policy.default is used. | bytes or str |
part.get_content_charset() | For text parts, retrieves the specified character set (e.g., utf-8). | str or None |
Important Notes
The Trap of Text Attachments
The get_content() method is designed to be helpful. If an attachment is text/plain or text/csv, it may automatically decode the data and return it as a Python str.
While you typically use open(…, “wb”) (binary mode) to save files, writing a str to a wb file will cause an error. As shown in the code example, you must use isinstance(content, str) to check the type and .encode() the content if necessary.
Filename Safety
Using the filename returned by get_filename() directly can be risky. If the filename contains path segments like ../, it could save files in unexpected locations (Directory Traversal). It is recommended to use os.path.basename() to extract only the actual filename.
File Overwriting
If you receive multiple emails with the same attachment filename, saving them in the same directory will overwrite existing files. Consider adding a timestamp or a unique ID to the filename to prevent this.
Advanced Usage
This example shows how to filter and save only specific file extensions (e.g., PDF and PNG).
import os
def save_specific_attachments(msg, allowed_extensions={".pdf", ".png"}):
for part in msg.iter_attachments():
filename = part.get_filename()
if not filename:
continue
# Extract and check the extension
_, ext = os.path.splitext(filename)
if ext.lower() not in allowed_extensions:
print(f"Skipping: {filename}")
continue
# Simplified saving process
content = part.get_content()
with open(filename, "wb") as f:
if isinstance(content, str):
f.write(content.encode('utf-8'))
else:
f.write(content)
print(f"Saved: {filename}")
Summary
By using msg.iter_attachments(), you can access attachments directly without worrying about the complex structure of multipart emails. However, since the return type of get_content() can be either str or bytes depending on the Content-Type, always performing a type check is the key to preventing bugs.
