【Python】Extracting Subject, Sender, and Body from EmailMessage Objects

2026年1月14日

Overview

This article explains how to properly extract header information (Subject, From, To) and the message body (Plain Text or HTML) from an EmailMessage object created by Python’s email module. We will introduce a modern method using get_body() to retrieve the content from multipart emails by specifying a priority order.

Specifications (Input/Output)

Input: An email.message.EmailMessage object (assumed to have been created using message_from_bytes(..., policy=policy.default)).
Output:
- Strings for Subject, From, and To headers.
- String for the email body (HTML or Plain Text).

Basic Usage

Header information can be retrieved by specifying keys, similar to a dictionary. By using the get_body() method, you can retrieve the body without needing to worry about the complex multipart structure.

from email import policy, message_from_bytes

# Assuming raw_email is the received byte data
msg = message_from_bytes(raw_email, policy=policy.default)

# 1. Get header information
print(f"Subject: {msg.get('Subject')}")
print(f"From: {msg.get('From')}")

# 2. Get the body (Priority: Plain Text)
body = msg.get_body(preferencelist=('plain', 'html'))
if body:
    print(body.get_content())

Full Code

This is a complete parsing code that generates an object from raw bytes and extracts various headers and the body (HTML if available, otherwise Text).

from email import message_from_bytes, policy
from email.message import EmailMessage

def parse_email_object():
    """
    Demo function to create an EmailMessage object from sample raw email data
    and extract the subject, recipients, and body.
    """
    # Sample raw email data (bytes)
    # Usually, this is retrieved from a server using imaplib or similar libraries.
    raw_email_data = b"""\
MIME-Version: 1.0
Subject: =?utf-8?B?44OG44K544OI44Oh44O844Or44Gu5Lu25ZCN?=
From: sender@example.com
To: receiver@example.com
Content-Type: multipart/alternative; boundary="boundary_text"

--boundary_text
Content-Type: text/plain; charset="utf-8"

This is the plain text body.

--boundary_text
Content-Type: text/html; charset="utf-8"

<html><body><h1>This is the HTML body.</h1></body></html>

--boundary_text--
"""

    # 1. Create EmailMessage object
    # Specifying policy=policy.default is very important as it enables header decoding
    # and makes the get_body() method available.
    msg = message_from_bytes(raw_email_data, policy=policy.default)

    print("--- Header Information ---")
    # Retrieve using msg.get(header_name). Returns None if the key does not exist.
    subject = msg.get("Subject")
    sender = msg.get("From")
    receiver = msg.get("To")
    date = msg.get("Date")

    print(f"Subject : {subject}")
    print(f"From    : {sender}")
    print(f"To      : {receiver}")
    print(f"Date    : {date}")

    print("\n--- Extracting the Body ---")
    
    # 2. Identify the body part (get_body)
    # Use preferencelist to specify the priority of formats you want to retrieve.
    # ('html', 'plain') -> Gets HTML if available, otherwise Text.
    # ('plain', 'html') -> Gets Text if available, otherwise HTML.
    
    body_part = msg.get_body(preferencelist=('html', 'plain'))

    if body_part:
        # 3. Extract content (get_content)
        # Retrieves the actual string data (automatically decoded).
        content = body_part.get_content()
        
        # Check which type was retrieved
        content_type = body_part.get_content_type()
        print(f"Content Type: {content_type}")
        print("Content:")
        print(content)
    else:
        print("Body not found (it might be an email with only attachments).")

if __name__ == "__main__":
    parse_email_object()

Customization Points

Main Methods of EmailMessage Objects

The following table lists the main methods and attributes used for parsing.

Method / Attribute	Description	Example
`msg.get("Header-Name")`	Retrieves the value of a specific header. If `policy.default` is applied, it returns a decoded string (like Japanese).	`msg.get("Subject")`
`msg["Header-Name"]`	Access headers in a dictionary style. Similar to `get`, but behaves differently if the key is missing (usually `get` is recommended).	`msg["From"]`
`msg.get_body(preferencelist=...)`	Returns the first part (as an `EmailMessage` object) that matches the specified priority list (e.g., html, plain) in a multipart email.	`msg.get_body(preferencelist=('plain',))`
`part.get_content()`	Returns the payload (content) of that part as a decoded string or byte sequence.	`body_part.get_content()`
`part.iter_attachments()`	Returns an iterator for the parts treated as attachments.	`for f in msg.iter_attachments():`

Preference List (preferencelist)

The preferencelist argument in the get_body method takes a tuple or list of subtypes (e.g., html for text/html).

(‘html’, ‘plain’): Use this when you prefer a rich visual display.
(‘plain’, ‘html’): Use this when you prefer simple text processing or log saving.

Important Notes

Forgetting the Policy Specification

If you omit the policy argument, such as message_from_bytes(data), the old compat32 policy is applied. In this mode, the get_body() method does not exist (causing an error), and Japanese subjects will remain encoded (e.g., ?utf-8?...). Always specify policy=policy.default.

Cases with No Body

If an email contains only attachments or is empty, get_body() will return None. Always include a check like if body_part:.

Encoding

The get_content() method automatically decodes content based on the charset parameter in the Content-Type header. However, if the sender’s settings are incorrect, you may encounter garbled text or a UnicodeDecodeError.

Advanced Usage

This is an example of forcefully extracting text by removing HTML tags when only an HTML body exists.

from email import policy, message_from_bytes
import re

def extract_text_forcefully(raw_data):
    msg = message_from_bytes(raw_data, policy=policy.default)
    
    # Search for the text part
    body_part = msg.get_body(preferencelist=('plain',))
    
    if body_part:
        return body_part.get_content()
    
    # If there is no text and only HTML exists
    html_part = msg.get_body(preferencelist=('html',))
    if html_part:
        html_content = html_part.get_content()
        # Simple tag removal (using regex)
        # For professional use, please use a library like BeautifulSoup.
        text_content = re.sub('<[^>]+>', '', html_content)
        return text_content.strip()
        
    return ""

Summary

To parse an EmailMessage object, start by creating it with policy.default. After that, you can simply use msg.get("Subject") for headers and msg.get_body() for the content. This allows you to utilize email data in your program without worrying about complex MIME structures.

よかったらシェアしてね！