【Python】Receiving and Parsing Emails from IMAP Servers

目次

Overview

This article explains how to receive emails from an IMAP server using Python’s standard imaplib library. We will cover the entire process, including connecting to the server and using the email module to parse received data into a user-friendly object (EmailMessage type) to extract subjects, bodies, and sender addresses.

Specifications (Input/Output)

  • Input:
    • IMAP server info (host, port)
    • Account info (email address, password)
    • Search criteria (subject, unread, all, etc.)
  • Output:
    • Email object (EmailMessage)
    • Extracted data (Subject, Body, Sender, etc.)
  • Requirements:
    • Uses only Python standard libraries (imaplib, email).
    • For services like Gmail, you may need to generate an “App Password.”

Basic Usage

This is the basic flow for connecting to an IMAP server, selecting the inbox, and parsing the latest email.

import imaplib
from email import message_from_bytes, policy

# 1. Connect to the server and log in
imap = imaplib.IMAP4_SSL("imap.example.com", 993)
imap.login("user@example.com", "password")

# 2. Select mailbox and search
imap.select("INBOX")
stat, data = imap.search(None, "ALL")

# 3. Get the ID of the latest email
latest_email_id = data[0].split()[-1]

# 4. Fetch and parse data
stat, msg_data = imap.fetch(latest_email_id, "(RFC822)")
raw_email = msg_data[0][1]
msg = message_from_bytes(raw_email, policy=policy.default)

print(f"Subject: {msg['Subject']}")
imap.logout()

Full Code

This is a practical code recipe to search for emails containing a “specific subject” and display the body text.

import imaplib
from email import message_from_bytes, policy

def receive_email_imap():
    """
    Demo function to receive and display email content using IMAP4.
    """
    # Connection settings (Example for Gmail)
    imap_server = "imap.gmail.com"
    imap_port = 993
    username = "your_email@gmail.com"
    password = "your_app_password" 

    try:
        # 1. Establish SSL connection
        print("Connecting to server...")
        imap = imaplib.IMAP4_SSL(imap_server, imap_port)

        # 2. Login
        imap.login(username, password)
        print("Login successful")

        # 3. Select mailbox ('INBOX')
        imap.select("INBOX")

        # 4. Search emails
        print("Searching emails...")
        # Example: Search for emails with "Test" in the subject
        search_criteria = '(SUBJECT "Test")'
        
        status, messages = imap.search("UTF-8", search_criteria)
        
        email_ids = messages[0].split()
        if not email_ids:
            print("No matching emails found.")
            return

        # Process only the latest email (the last one in the ID list)
        latest_id = email_ids[-1]
        print(f"Retrieving email ID {latest_id.decode()}...")

        # 5. Fetch raw email data (RFC822 format = headers + body)
        status, msg_data = imap.fetch(latest_id, "(RFC822)")

        # 6. Convert raw bytes to EmailMessage object
        raw_email = msg_data[0][1]
        
        # Using policy.default automates subject decoding
        msg = message_from_bytes(raw_email, policy=policy.default)

        # 7. Display content
        print("-" * 30)
        print(f"From    : {msg['From']}")
        print(f"To      : {msg['To']}")
        print(f"Subject : {msg['Subject']}")
        print("-" * 30)

        # Handle multipart messages to get the body
        body = ""
        if msg.is_multipart():
            for part in msg.walk():
                content_type = part.get_content_type()
                content_disposition = str(part.get("Content-Disposition"))

                # Exclude attachments and get text/html
                if "attachment" not in content_disposition:
                    if content_type == "text/plain":
                        try:
                            body = part.get_content()
                        except:
                            pass 
        else:
            # For single-part messages
            body = msg.get_content()

        print("--- Body (Excerpt) ---")
        print(body[:200] + "..." if len(body) > 200 else body)

    except Exception as e:
        print(f"An error occurred: {e}")

    finally:
        # 8. Logout and disconnect
        try:
            imap.close() 
            imap.logout() 
            print("Logged out successfully")
        except:
            pass

if __name__ == "__main__":
    receive_email_imap()

Customization Points

Main Methods and Roles

These are the primary operations of the imaplib.IMAP4_SSL object.

MethodArgument ExamplePurpose / Meaning
IMAP4_SSL(host, port)"imap.gmail.com", 993Creates an IMAP server instance with SSL encryption.
login(user, password)"user@mail.com", "pass"Authenticates the user with the server.
select("INBOX")"INBOX"Selects the mailbox (folder) to operate on.
search(None, "ALL")None, "UNSEEN"Gets a list of email IDs that match the criteria.
fetch(id, "(RFC822)")latest_id, "(RFC822)"Retrieves the email data for the specified ID.
logout()NoneEnds the server connection.

Specifying Search Criteria (search)

The search method is powerful, but you must be careful with the syntax.

  • Get all: imap.search(None, "ALL")
  • Unread only: imap.search(None, "UNSEEN")
  • Subject search: imap.search(“UTF-8”, ‘(SUBJECT “SearchWord”)’)For non-English searches, specify “UTF-8” as the first argument.

Converting to EmailMessage

imap.fetch returns “bytes.” To handle this easily in Python, use the email module. Using message_from_bytes(raw, policy=policy.default) is recommended. By setting policy.default, MIME headers (like encoded subjects) are automatically decoded, allowing you to get the subject as a normal string via msg['Subject'].

Important Notes

Seen vs. Unread Status

When you fetch data using (RFC822), the email is automatically marked as “Read” (SEEN) on the server. If you want to fetch data without marking it as read, use (BODY.PEEK[]) instead.

Security

Major providers like Gmail may block IMAP connections using regular login passwords. In such cases, you must enable two-factor authentication and use an “App Password.”

Folder Names

In some environments (like Outlook), folder names other than INBOX may be in local languages. In these cases, you might need to encode the folder names in a specific format (UTF-7).

Advanced Usage

Here is an example of how to save attachments found in an email.

import os

def save_attachments(msg, download_folder="downloads"):
    os.makedirs(download_folder, exist_ok=True)

    for part in msg.walk():
        # If Content-Disposition contains "attachment", it is a file
        if part.get_content_maintype() == 'multipart':
            continue
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()
        if filename:
            filepath = os.path.join(download_folder, filename)
            with open(filepath, 'wb') as f:
                f.write(part.get_payload(decode=True))
            print(f"Attachment saved: {filepath}")

Summary

When receiving emails with Python, it is important to understand the division of labor: imaplib handles communication, while the email module parses the data. By using policy.default during parsing, you can automate complex character encoding tasks and manage emails with simple code.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次