[C#] How to Traverse and Retrieve All Cell Data Using NPOI

目次

Overview

This implementation uses the NPOI library to sequentially traverse every row and cell in an Excel sheet and retrieve their values as strings. This approach is highly effective for handling variable-length data where the number of rows or columns is not fixed. By utilizing C# iterators (IEnumerable / yield return), we achieve a clean and memory-efficient reading process.

Specifications (Input/Output)

  • Input: Excel file path, target sheet index.
  • Output: Tab-separated cell values printed to the console.
  • Library: NPOI (NuGet package).

Implemented Methods

Method NameDescription
GetRowsIterates through the sheet from the first row to the last row containing data, returning IRow objects.
GetCellsIterates through a specified row from the first column to the last column, returning ICell objects.

Basic Usage

var book = MyExcelIterator.Open("data.xlsx");
book.SelectSheet(0);

// Loop through all rows
foreach (var row in book.GetRows())
{
    // Loop through all cells in the row
    foreach (var cell in book.GetCells(row))
    {
        // Display the value (NPOI's ToString provides a simple string representation)
        Console.Write($"{cell?.ToString()}\t");
    }
    Console.WriteLine();
}

Full Code Example

To execute this code, you need to add the package via dotnet add package NPOI.

using System;
using System.Collections.Generic;
using System.IO;
using NPOI.SS.UserModel;
using NPOI.XSSF.UserModel;

class Program
{
    static void Main()
    {
        string filePath = "example.xlsx";

        if (!File.Exists(filePath))
        {
            Console.WriteLine("File not found.");
            return;
        }

        try
        {
            // 1. Open the file
            var xls = MyExcelIterator.Open(filePath);
            
            // 2. Select the first sheet
            xls.SelectSheet(0);

            // 3. Traverse and output all rows and cells
            Console.WriteLine("--- Excel Data Dump ---");
            
            foreach (var row in xls.GetRows())
            {
                var lineValues = new List<string>();

                foreach (var cell in xls.GetCells(row))
                {
                    // If cell is null, use empty string; otherwise, use ToString()
                    // NPOI's ToString() returns an appropriate string based on the type
                    lineValues.Add(cell?.ToString() ?? "");
                }

                // Join with tabs and display
                Console.WriteLine(string.Join("\t", lineValues));
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
        }
    }
}

/// <summary>
/// Class for iterating through all Excel data using NPOI
/// </summary>
public sealed class MyExcelIterator
{
    private XSSFWorkbook _workbook;
    private ISheet _sheet;

    private MyExcelIterator() { }

    /// <summary>
    /// Opens a file and creates an instance
    /// </summary>
    public static MyExcelIterator Open(string filePath)
    {
        var obj = new MyExcelIterator();
        
        using var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
        obj._workbook = new XSSFWorkbook(stream);
        
        return obj;
    }

    /// <summary>
    /// Selects the target sheet
    /// </summary>
    public void SelectSheet(int index)
    {
        _sheet = _workbook.GetSheetAt(index);
    }

    /// <summary>
    /// Iterator that enumerates all valid rows in the sheet
    /// </summary>
    public IEnumerable<IRow> GetRows()
    {
        if (_sheet == null) yield break;

        // FirstRowNum: Index of the first row with data
        // LastRowNum: Index of the last row with data
        for (int i = _sheet.FirstRowNum; i <= _sheet.LastRowNum; i++)
        {
            var row = _sheet.GetRow(i);
            if (row != null)
            {
                yield return row;
            }
        }
    }

    /// <summary>
    /// Iterator that enumerates all cells within a row
    /// </summary>
    public IEnumerable<ICell> GetCells(IRow row)
    {
        if (row == null) yield break;

        // LastCellNum returns "Last column index + 1"
        // (e.g., if data goes up to column index 2, LastCellNum is 3)
        for (int i = 0; i < row.LastCellNum; i++)
        {
            var cell = row.GetCell(i);
            yield return cell;
        }
    }
}

Customization Points

  • Handling Empty Rows: The current GetRows only returns rows where row != null. If you want to process blank rows without skipping them, change the logic to yield return row; and handle the null check on the calling side.
  • Strict Value Retrieval: While cell.ToString() is convenient, it may not format dates or numbers exactly as intended. For stricter control, combine this with the CellType-based logic introduced in previous articles.

Important Notes

  • The LastCellNum Trap: The row.LastCellNum property in NPOI represents the “number of cells” (1-based) rather than the “index of the last column.” Therefore, the loop condition should be i < LastCellNum instead of i <= LastCellNum. Using the wrong condition can lead to out-of-range errors or unexpected behavior.
  • Existence of Null Cells: Cells with no value or cells that only have formatting but no data will return null via GetCell(i). Always use a null check or the ?. operator.
  • Performance: Using IEnumerable and yield return is memory-efficient because it avoids loading all data into a list at once. However, for extremely large files, you might still need to consider specialized streaming approaches.

Advanced Usage

Extracting only rows that match specific conditions

By combining this with LINQ, you can easily extract rows containing specific keywords.

using System.Linq;

// Get only the rows where column A (Index 0) contains the text "Target"
var targetRows = xls.GetRows()
    .Where(row => row.GetCell(0)?.ToString() == "Target");

foreach (var row in targetRows)
{
    // Process matching rows...
}

Conclusion

Traversing all cells in NPOI fundamentally involves a nested structure: a row loop from FirstRowNum to LastRowNum and a cell loop from 0 to LastCellNum. By hiding this repetition within an iterator class, the calling code remains readable using simple foreach statements, allowing you to focus on the data processing logic for each row.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次