[Python] How to Read Excel Files with Pandas and Key Options Explained (read_excel)

In business settings, data is commonly managed not only in CSV files but also in Microsoft Excel format (.xlsx). By using the Pandas read_excel function, you can efficiently import specific sheets or necessary columns within an Excel workbook as a DataFrame.

This article explains the basic usage of the read_excel function and the main parameters (arguments) for finely controlling the scope of data importation.

目次

List of Key Parameters

While read_excel has many options, the four most commonly used parameters are summarized in the table below.

ParameterMeaning / RoleExample Specification
sheet_nameSpecifies the sheet to read. Can be specified by sheet name (string) or index (0-based integer). Multiple sheets can be specified as a list."Sales_Data", 0, ["Sheet1", "Sheet2"]
headerSpecifies the row number to use as the header (column names). Default is 0 (1st row). Specify None if there is no header.1 (use 2nd row as header), None
usecolsLimits the “columns” to be read. Can be specified by Excel column letters (“A:C”), column names, or column numbers."A:C", [0, 2], ["ID", "Name"]
index_colSpecifies the column to use as the DataFrame index (row label).0, "Date"

Implementation Sample Code

Below is the complete Python code that generates a sample Excel file and reads it using various options.

Note: To execute this code, the Excel operation library openpyxl must be installed (pip install openpyxl).

import pandas as pd
import os

def demonstrate_read_excel():
    """
    Function to demonstrate Excel file reading options using Pandas
    """
    
    # ---------------------------------------------------------
    # 1. Preparation: Create a test Excel file
    # ---------------------------------------------------------
    file_name = "monthly_sales_report.xlsx"
    sheet1_name = "April_2026"
    sheet2_name = "May_2026"

    # Sheet 1: April Data (Assuming 1st row contains title/extra info)
    # Col A: ID, Col B: Product, Col C: Price, Col D: Quantity
    df_april = pd.DataFrame({
        "Transaction_ID": [1001, 1002, 1003],
        "Product_Name": ["Laptop", "Mouse", "Monitor"],
        "Unit_Price": [1200, 50, 300],
        "Sold_Quantity": [5, 20, 10]
    })

    # Sheet 2: May Data (Standard format)
    df_may = pd.DataFrame({
        "Transaction_ID": [2001, 2002],
        "Product_Name": ["Keyboard", "HDMI Cable"],
        "Unit_Price": [80, 15],
        "Sold_Quantity": [15, 30]
    })

    # Save as Excel file
    # Intentionally insert an empty row in Sheet 1 to test the header option
    with pd.ExcelWriter(file_name, engine="openpyxl") as writer:
        # April Sheet: Put report title in 1st row, start data from 2nd row (index=1)
        df_april.to_excel(writer, sheet_name=sheet1_name, index=False, startrow=1)
        # May Sheet: Placed normally
        df_may.to_excel(writer, sheet_name=sheet2_name, index=False)
        
    print(f"--- Preparation Complete: Created {file_name} ---\n")


    # ---------------------------------------------------------
    # 2. Basic Reading
    # ---------------------------------------------------------
    print("=== 1. Basic Reading (Specifying sheet_name) ===")
    # Specify sheet name with sheet_name
    # Since it only reads the first sheet by default, we specify it explicitly
    df_basic = pd.read_excel(file_name, sheet_name=sheet2_name)
    print(f"Contents of sheet '{sheet2_name}':")
    print(df_basic)
    print("\n")


    # ---------------------------------------------------------
    # 3. Reading Multiple Sheets Simultaneously
    # ---------------------------------------------------------
    print("=== 2. Reading Multiple Sheets ===")
    # Passing a list to sheet_name returns a dictionary of {sheet_name: DataFrame}
    sheets_dict = pd.read_excel(file_name, sheet_name=[sheet1_name, sheet2_name], header=1)
    
    print("Keys of loaded sheets:", sheets_dict.keys())
    print(f"Top rows of sheet '{sheet1_name}':")
    print(sheets_dict[sheet1_name].head(2))
    print("\n")


    # ---------------------------------------------------------
    # 4. Specifying Header Position (header)
    # ---------------------------------------------------------
    print("=== 3. Specifying Header Position (header=1) ===")
    # Since we left the 1st row empty during creation (startrow=1),
    # we specify header=1 to read the 2nd row as the header
    df_header = pd.read_excel(file_name, sheet_name=sheet1_name, header=1)
    print(df_header.head(2))
    print("\n")


    # ---------------------------------------------------------
    # 5. Reading Only Specific Columns (usecols)
    # ---------------------------------------------------------
    print("=== 4. Reading Only Specific Columns (usecols) ===")
    
    # Pattern A: Range specification using Excel column letters ("A:C")
    # Get columns from A to C
    df_cols_range = pd.read_excel(
        file_name, 
        sheet_name=sheet2_name, 
        usecols="A:C"
    )
    print("--- usecols='A:C' ---")
    print(df_cols_range)

    # Pattern B: Specification by list of column names
    # Specifying by name is the most robust method
    df_cols_list = pd.read_excel(
        file_name, 
        sheet_name=sheet2_name, 
        usecols=["Product_Name", "Sold_Quantity"]
    )
    print("\n--- usecols=['Product_Name', 'Sold_Quantity'] ---")
    print(df_cols_list)
    
    # Pattern C: Specification by column numbers (indices)
    # Get 0th (Col A) and 2nd (Col C) columns
    df_cols_idx = pd.read_excel(
        file_name, 
        sheet_name=sheet2_name, 
        usecols=[0, 2]
    )
    print("\n--- usecols=[0, 2] ---")
    print(df_cols_idx)
    
    # (Reference) Delete file
    # os.remove(file_name)

if __name__ == "__main__":
    demonstrate_read_excel()

Execution Result

--- Preparation Complete: Created monthly_sales_report.xlsx ---

=== 1. Basic Reading (Specifying sheet_name) ===
Contents of sheet 'May_2026':
   Transaction_ID Product_Name  Unit_Price  Sold_Quantity
0            2001     Keyboard          80             15
1            2002   HDMI Cable          15             30


=== 2. Reading Multiple Sheets ===
Keys of loaded sheets: dict_keys(['April_2026', 'May_2026'])
Top rows of sheet 'April_2026':
   Transaction_ID Product_Name  Unit_Price  Sold_Quantity
0            1001       Laptop        1200              5
1            1002        Mouse          50             20


=== 3. Specifying Header Position (header=1) ===
   Transaction_ID Product_Name  Unit_Price  Sold_Quantity
0            1001       Laptop        1200              5
1            1002        Mouse          50             20


=== 4. Reading Only Specific Columns (usecols) ===
--- usecols='A:C' ---
   Transaction_ID Product_Name  Unit_Price
0            2001     Keyboard          80
1            2002   HDMI Cable          15

--- usecols=['Product_Name', 'Sold_Quantity'] ---
  Product_Name  Sold_Quantity
0     Keyboard             15
1   HDMI Cable             30

--- usecols=[0, 2] ---
   Transaction_ID  Unit_Price
0            2001          80
1            2002          15

Detailed Explanation of Each Option

How to Specify usecols

usecols allows for very flexible specifications.

  • Range specification by string: You can use formats familiar to Excel users, such as "A:D" or "A,C,E".
  • Specification by list of column names: Specifying column names like ["Item", "Price"] makes the code less likely to break even if the column order changes.
  • Integer list: Specify by column number (0-based) like [0, 1].

Adjusting header

If the top of the Excel file contains titles or explanatory notes, the row where data starts will be shifted. In that case, specifying the appropriate row number, such as header=2 (3rd row is the header), allows you to create a DataFrame with the correct structure.

By combining these options, you can import raw Excel data directly into Python without pre-processing.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次