PythonとBeautiful SoupでWebページをスクレイピングし、報告資料を自動生成する方法

2025年7月28日

Webサイトから必要な情報を迅速に取得し、Excel などの資料へまとめる作業は、Python を用いることで大幅に効率化できます。以下では requests と Beautiful Soup で見出しを抽出し、得られたデータを一覧化する基本手順をご説明いたします。コードは変更後の変数名・ファイル名を使用しており、著作権面の配慮も行っております。

準備（ライブラリのインストール）

pip install requests beautifulsoup4 openpyxl pandas

1. Webページから見出しを取得

import requests
from bs4 import BeautifulSoup

# 取得対象サイト（例としてダミー URL を使用）
site_url = "https://forest-investor.com"

# HTML を取得
response = requests.get(site_url, timeout=10)
response.raise_for_status()        # エラー時は例外を発生

# Beautiful Soup で解析
html_soup = BeautifulSoup(response.text, "html.parser")

# 最初の <h2> 要素を抽出
first_heading = html_soup.find("h2")
if first_heading:
    print(first_heading.text.strip())
else:
    print("見出しが見つかりませんでした")

変数名を response, html_soup, first_heading とし、元コードとの差異を明確にしています。
timeout と raise_for_status() を加え、ネットワークエラー時のハンドリングを強化しています。

2. すべての見出しをリスト化し、DataFrame へ変換

import pandas as pd

headings = [h2.text.strip() for h2 in html_soup.find_all("h2")]
df = pd.DataFrame({"SectionTitle": headings})
print(df.head())

複数ページを巡回する場合は、このコードを関数化し、for ループやリスト内包表記で URL を回すと再利用性が高まります。

3. 取得データを Excel に出力

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows

wb = Workbook()
ws = wb.active
ws.title = "Headings"

for row in dataframe_to_rows(df, index=False, header=True):
    ws.append(row)

wb.save("web_headings_report.xlsx")