[Python] Collecting Multiple Elements with BeautifulSoup: Complete Guide to the find_all Method

In web scraping, there are many situations where you want to retrieve multiple elements with the same structure at once, such as a list of news articles or products.

While the find() method retrieves only the “first” element, the find_all() method allows you to retrieve all elements matching the conditions in a list format (ResultSet).

Here, using a ToDo list HTML as an example, I will explain the technique for retrieving multiple tags at once and extracting data.

目次

Executable Sample Code

The following code is a script that retrieves all task lists (<li> tags) in the HTML and outputs their text content in order.

from bs4 import BeautifulSoup

def extract_all_tasks():
    # HTML to be analyzed (Assuming a ToDo list)
    html_doc = """
    <html>
        <body>
            <h3>Today's Tasks</h3>
            <ul class="task-list">
                <li class="item">Check emails</li>
                <li class="item">Weekly meeting</li>
                <li class="item">Fix Python script</li>
                <li class="item">Create daily report</li>
            </ul>
            <div class="footer">Done: 0 / Remaining: 4</div>
        </body>
    </html>
    """

    # Create BeautifulSoup object
    soup = BeautifulSoup(html_doc, "html5lib")

    print("=== Retrieve All Elements (find_all) ===")

    # 1. Retrieve all <li> tags
    # find_all returns a "list-like object" containing all matching tags
    task_tags = soup.find_all("li")

    # 2. Check the number of retrieved elements
    print(f"Number of tasks retrieved: {len(task_tags)}")

    # 3. Extract content using a loop
    print("\n[Task List]")
    for i, tag in enumerate(task_tags, 1):
        # Each element can be treated as an individual Tag object
        print(f"{i}. {tag.text}")

    print("\n=== (Application) Extraction using list comprehension ===")
    # Pythonic way to create a list of text only in one line
    task_texts = [tag.text for tag in soup.find_all("li")]
    print(task_texts)

if __name__ == "__main__":
    extract_all_tasks()

Explanation: Features and Usage of find_all

1. The return value can be handled like a “list”

The return value of soup.find_all("p") is a ResultSet object, which can be handled almost exactly like a standard Python list.

  • Loop processing with for is possible.
  • Getting the count with len() is possible.
  • Access by index (tags[0]) is possible.

2. Returns an “empty list” if not found

While the find() method returns None when an element is not found, find_all() returns an empty list []. Therefore, writing a loop process as shown below will not cause an error; the loop simply won’t run.

# Even if tags are not found, it doesn't cause an error; the loop just doesn't run
for tag in soup.find_all("non-existent-tag"):
    print(tag)

3. Specifying conditions is the same as find

Just like with find(), you can filter by class name or attributes.

# Retrieve all <li> tags that have class="active"
active_items = soup.find_all("li", class_="active")
よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次