In web scraping, there are many situations where you want to retrieve multiple elements with the same structure at once, such as a list of news articles or products.
While the find() method retrieves only the “first” element, the find_all() method allows you to retrieve all elements matching the conditions in a list format (ResultSet).
Here, using a ToDo list HTML as an example, I will explain the technique for retrieving multiple tags at once and extracting data.
Executable Sample Code
The following code is a script that retrieves all task lists (<li> tags) in the HTML and outputs their text content in order.
from bs4 import BeautifulSoup
def extract_all_tasks():
# HTML to be analyzed (Assuming a ToDo list)
html_doc = """
<html>
<body>
<h3>Today's Tasks</h3>
<ul class="task-list">
<li class="item">Check emails</li>
<li class="item">Weekly meeting</li>
<li class="item">Fix Python script</li>
<li class="item">Create daily report</li>
</ul>
<div class="footer">Done: 0 / Remaining: 4</div>
</body>
</html>
"""
# Create BeautifulSoup object
soup = BeautifulSoup(html_doc, "html5lib")
print("=== Retrieve All Elements (find_all) ===")
# 1. Retrieve all <li> tags
# find_all returns a "list-like object" containing all matching tags
task_tags = soup.find_all("li")
# 2. Check the number of retrieved elements
print(f"Number of tasks retrieved: {len(task_tags)}")
# 3. Extract content using a loop
print("\n[Task List]")
for i, tag in enumerate(task_tags, 1):
# Each element can be treated as an individual Tag object
print(f"{i}. {tag.text}")
print("\n=== (Application) Extraction using list comprehension ===")
# Pythonic way to create a list of text only in one line
task_texts = [tag.text for tag in soup.find_all("li")]
print(task_texts)
if __name__ == "__main__":
extract_all_tasks()
Explanation: Features and Usage of find_all
1. The return value can be handled like a “list”
The return value of soup.find_all("p") is a ResultSet object, which can be handled almost exactly like a standard Python list.
- Loop processing with
foris possible. - Getting the count with
len()is possible. - Access by index (
tags[0]) is possible.
2. Returns an “empty list” if not found
While the find() method returns None when an element is not found, find_all() returns an empty list []. Therefore, writing a loop process as shown below will not cause an error; the loop simply won’t run.
# Even if tags are not found, it doesn't cause an error; the loop just doesn't run
for tag in soup.find_all("non-existent-tag"):
print(tag)
3. Specifying conditions is the same as find
Just like with find(), you can filter by class name or attributes.
# Retrieve all <li> tags that have class="active"
active_items = soup.find_all("li", class_="active")
