Overview
This implementation uses the NPOI library to sequentially traverse every row and cell in an Excel sheet and retrieve their values as strings. This approach is highly effective for handling variable-length data where the number of rows or columns is not fixed. By utilizing C# iterators (IEnumerable / yield return), we achieve a clean and memory-efficient reading process.
Specifications (Input/Output)
- Input: Excel file path, target sheet index.
- Output: Tab-separated cell values printed to the console.
- Library: NPOI (NuGet package).
Implemented Methods
| Method Name | Description |
| GetRows | Iterates through the sheet from the first row to the last row containing data, returning IRow objects. |
| GetCells | Iterates through a specified row from the first column to the last column, returning ICell objects. |
Basic Usage
var book = MyExcelIterator.Open("data.xlsx");
book.SelectSheet(0);
// Loop through all rows
foreach (var row in book.GetRows())
{
// Loop through all cells in the row
foreach (var cell in book.GetCells(row))
{
// Display the value (NPOI's ToString provides a simple string representation)
Console.Write($"{cell?.ToString()}\t");
}
Console.WriteLine();
}
Full Code Example
To execute this code, you need to add the package via dotnet add package NPOI.
using System;
using System.Collections.Generic;
using System.IO;
using NPOI.SS.UserModel;
using NPOI.XSSF.UserModel;
class Program
{
static void Main()
{
string filePath = "example.xlsx";
if (!File.Exists(filePath))
{
Console.WriteLine("File not found.");
return;
}
try
{
// 1. Open the file
var xls = MyExcelIterator.Open(filePath);
// 2. Select the first sheet
xls.SelectSheet(0);
// 3. Traverse and output all rows and cells
Console.WriteLine("--- Excel Data Dump ---");
foreach (var row in xls.GetRows())
{
var lineValues = new List<string>();
foreach (var cell in xls.GetCells(row))
{
// If cell is null, use empty string; otherwise, use ToString()
// NPOI's ToString() returns an appropriate string based on the type
lineValues.Add(cell?.ToString() ?? "");
}
// Join with tabs and display
Console.WriteLine(string.Join("\t", lineValues));
}
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
}
}
}
/// <summary>
/// Class for iterating through all Excel data using NPOI
/// </summary>
public sealed class MyExcelIterator
{
private XSSFWorkbook _workbook;
private ISheet _sheet;
private MyExcelIterator() { }
/// <summary>
/// Opens a file and creates an instance
/// </summary>
public static MyExcelIterator Open(string filePath)
{
var obj = new MyExcelIterator();
using var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
obj._workbook = new XSSFWorkbook(stream);
return obj;
}
/// <summary>
/// Selects the target sheet
/// </summary>
public void SelectSheet(int index)
{
_sheet = _workbook.GetSheetAt(index);
}
/// <summary>
/// Iterator that enumerates all valid rows in the sheet
/// </summary>
public IEnumerable<IRow> GetRows()
{
if (_sheet == null) yield break;
// FirstRowNum: Index of the first row with data
// LastRowNum: Index of the last row with data
for (int i = _sheet.FirstRowNum; i <= _sheet.LastRowNum; i++)
{
var row = _sheet.GetRow(i);
if (row != null)
{
yield return row;
}
}
}
/// <summary>
/// Iterator that enumerates all cells within a row
/// </summary>
public IEnumerable<ICell> GetCells(IRow row)
{
if (row == null) yield break;
// LastCellNum returns "Last column index + 1"
// (e.g., if data goes up to column index 2, LastCellNum is 3)
for (int i = 0; i < row.LastCellNum; i++)
{
var cell = row.GetCell(i);
yield return cell;
}
}
}
Customization Points
- Handling Empty Rows: The current
GetRowsonly returns rows whererow != null. If you want to process blank rows without skipping them, change the logic toyield return row;and handle the null check on the calling side. - Strict Value Retrieval: While
cell.ToString()is convenient, it may not format dates or numbers exactly as intended. For stricter control, combine this with theCellType-based logic introduced in previous articles.
Important Notes
- The LastCellNum Trap: The
row.LastCellNumproperty in NPOI represents the “number of cells” (1-based) rather than the “index of the last column.” Therefore, the loop condition should bei < LastCellNuminstead ofi <= LastCellNum. Using the wrong condition can lead to out-of-range errors or unexpected behavior. - Existence of Null Cells: Cells with no value or cells that only have formatting but no data will return
nullviaGetCell(i). Always use a null check or the?.operator. - Performance: Using
IEnumerableandyield returnis memory-efficient because it avoids loading all data into a list at once. However, for extremely large files, you might still need to consider specialized streaming approaches.
Advanced Usage
Extracting only rows that match specific conditions
By combining this with LINQ, you can easily extract rows containing specific keywords.
using System.Linq;
// Get only the rows where column A (Index 0) contains the text "Target"
var targetRows = xls.GetRows()
.Where(row => row.GetCell(0)?.ToString() == "Target");
foreach (var row in targetRows)
{
// Process matching rows...
}
Conclusion
Traversing all cells in NPOI fundamentally involves a nested structure: a row loop from FirstRowNum to LastRowNum and a cell loop from 0 to LastCellNum. By hiding this repetition within an iterator class, the calling code remains readable using simple foreach statements, allowing you to focus on the data processing logic for each row.
