【Python】Saving Attachments from EmailMessage Objects

目次

Overview

This article explains how to extract and save attachments from an EmailMessage object (retrieved via Python’s email module) to a local disk. We will cover handling binary files like images and PDFs, as well as correctly re-encoding text attachments that are automatically decoded by the library.

Specifications (Input/Output)

  • Input: email.message.EmailMessage object (containing attachments).
  • Output: Files saved to the disk.
  • Requirement: The email object must be parsed using policy.default.

Basic Usage

This is a basic implementation that loops through only the attachment parts of the email object, retrieves the filename and content, and saves them.

# Assuming 'msg' is an EmailMessage object

for part in msg.iter_attachments():
    # Get the filename
    filename = part.get_filename()
    
    if filename:
        # Retrieve content (may return bytes for images or str for text)
        content = part.get_content()
        
        # Save in binary mode
        with open(filename, "wb") as f:
            if isinstance(content, str):
                # If it's a string, encode it back to bytes before writing
                charset = part.get_content_charset() or "utf-8"
                f.write(content.encode(charset))
            else:
                # If it's binary, write it directly
                f.write(content)

Full Code

This is a complete simulation code that creates a dummy “email with attachments,” parses it, and saves the files to a directory.

import os
from email import message_from_bytes, policy
from email.message import EmailMessage

def extract_attachments_demo():
    """
    Demo function to extract and save attachments from an EmailMessage object.
    """
    
    # --- 1. Prepare Test Data (Simulating a received email with attachments) ---
    raw_msg = EmailMessage()
    raw_msg["Subject"] = "Test Email"
    raw_msg.set_content("Please check the attachments.")
    
    # Dummy attachment 1: Text file
    raw_msg.add_attachment(
        "This is an attached text file.".encode("utf-8"),
        maintype="text", subtype="plain", filename="note.txt"
    )
    
    # Dummy attachment 2: Binary file (PNG image data simulation)
    raw_msg.add_attachment(
        b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR...',
        maintype="image", subtype="png", filename="image.png"
    )
    
    # Convert to bytes (simulating raw data from an IMAP server)
    email_bytes = raw_msg.as_bytes()

    # --- 2. Implementation of Extraction ---
    
    # Convert bytes back to an EmailMessage object
    msg = message_from_bytes(email_bytes, policy=policy.default)
    
    print(f"Subject: {msg['Subject']}")
    
    # Directory to save files
    save_dir = "downloaded_attachments"
    os.makedirs(save_dir, exist_ok=True)

    # Loop through attachment parts using iter_attachments
    for part in msg.iter_attachments():
        filename = part.get_filename()
        
        if not filename:
            print("Skipping a part with no filename.")
            continue
            
        print(f"Found attachment: {filename}")
        
        # get_content() automatically converts data based on Content-Type
        # Returns bytes for images/PDFs or str for text
        content = part.get_content()
        
        save_path = os.path.join(save_dir, filename)
        
        try:
            with open(save_path, "wb") as f:
                if isinstance(content, str):
                    # If returned as str, encode it back to the original charset
                    charset = part.get_content_charset() or 'utf-8'
                    print(f"  - Encoding text data as {charset} and saving...")
                    f.write(content.encode(charset))
                else:
                    # If bytes, write directly
                    print("  - Saving binary data...")
                    f.write(content)
                    
            print(f"  -> Saved to: {save_path}")
            
        except Exception as e:
            print(f"  -> Error saving file: {e}")

if __name__ == "__main__":
    extract_attachments_demo()

Customization Points

These are the main methods used for handling attachments in EmailMessage objects.

Method / SyntaxDescriptionReturn Type
msg.iter_attachments()Returns an iterator to access only the parts treated as attachments.generator (EmailMessage parts)
part.get_filename()Retrieves the filename of the part. Decoded automatically.str or None
part.get_content()Retrieves the payload. Returns str for text and bytes for others if policy.default is used.bytes or str
part.get_content_charset()For text parts, retrieves the specified character set (e.g., utf-8).str or None

Important Notes

The Trap of Text Attachments

The get_content() method is designed to be helpful. If an attachment is text/plain or text/csv, it may automatically decode the data and return it as a Python str.

While you typically use open(…, “wb”) (binary mode) to save files, writing a str to a wb file will cause an error. As shown in the code example, you must use isinstance(content, str) to check the type and .encode() the content if necessary.

Filename Safety

Using the filename returned by get_filename() directly can be risky. If the filename contains path segments like ../, it could save files in unexpected locations (Directory Traversal). It is recommended to use os.path.basename() to extract only the actual filename.

File Overwriting

If you receive multiple emails with the same attachment filename, saving them in the same directory will overwrite existing files. Consider adding a timestamp or a unique ID to the filename to prevent this.

Advanced Usage

This example shows how to filter and save only specific file extensions (e.g., PDF and PNG).

import os

def save_specific_attachments(msg, allowed_extensions={".pdf", ".png"}):
    for part in msg.iter_attachments():
        filename = part.get_filename()
        if not filename:
            continue
            
        # Extract and check the extension
        _, ext = os.path.splitext(filename)
        if ext.lower() not in allowed_extensions:
            print(f"Skipping: {filename}")
            continue
            
        # Simplified saving process
        content = part.get_content()
        with open(filename, "wb") as f:
            if isinstance(content, str):
                f.write(content.encode('utf-8'))
            else:
                f.write(content)
        print(f"Saved: {filename}")

Summary

By using msg.iter_attachments(), you can access attachments directly without worrying about the complex structure of multipart emails. However, since the return type of get_content() can be either str or bytes depending on the Content-Type, always performing a type check is the key to preventing bugs.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次