[C#] Reading Binary Files in Chunks Using FileStream and yield return

When handling binary data like images or executable files (DLL/EXE), reading everything into memory at once with File.ReadAllBytes can lead to an OutOfMemoryException if the file is very large. By combining FileStream and yield return, you can read data in fixed buffer sizes. This method allows you to process files safely while keeping memory usage low.

目次

Table of Contents

  • Implementation Sample: Chunked Binary Reading
  • Sample Code
  • Execution Result
  • Explanation and Technical Points

Implementation Sample: Chunked Binary Reading

The following code reads a file 1KB (1024 bytes) at a time and prints the size of each chunk to the console.

Sample Code

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

public class Program
{
    public static void Main()
    {
        // Target file path
        string filePath = "test_data.bin";

        // 1. Create a dummy binary file for testing (2500 bytes)
        if (!File.Exists(filePath))
        {
            byte[] dummyData = new byte[2500];
            new Random().NextBytes(dummyData); // Fill with random values
            File.WriteAllBytes(filePath, dummyData);
        }

        Console.WriteLine($"File Size: {new FileInfo(filePath).Length} bytes\n");

        // 2. Open the FileStream
        using (FileStream fs = File.OpenRead(filePath))
        {
            // 3. Use the custom method to extract 1024 bytes at a time
            int count = 1;
            foreach (byte[] chunk in ReadBinaryFile(fs, 1024))
            {
                Console.WriteLine($"[Run {count}] Read Data Length: {chunk.Length} bytes");
                // You can perform binary analysis on the chunk here
                
                count++;
            }
        }
    }

    // Iterator method that reads and returns data from the stream in specific sizes
    static IEnumerable<byte[]> ReadBinaryFile(Stream stream, int bufferSize)
    {
        // Buffer for reading
        byte[] buffer = new byte[bufferSize];
        int readSize;

        // stream.Read returns the actual number of bytes read (0 means end of file)
        while ((readSize = stream.Read(buffer, 0, buffer.Length)) > 0)
        {
            // The buffer might be full, or it might contain only the remaining bytes at the end
            if (readSize == bufferSize)
            {
                // Return a copy of the buffer for safety
                yield return buffer.ToArray(); 
            }
            else
            {
                // Cut and return the data based on the actual read size
                yield return buffer.Take(readSize).ToArray();
            }
        }
    }
}

Execution Result

File Size: 2500 bytes

[Run 1] Read Data Length: 1024 bytes
[Run 2] Read Data Length: 1024 bytes
[Run 3] Read Data Length: 452 bytes

Explanation and Technical Points

1. FileStream.Read

This is the basic method for reading binary data. You call it using the format stream.Read(buffer, offset, count). The return value is the number of bytes actually read. If the value is 0, it means the end of the file (EOF) has been reached.

2. Utilizing yield return

When dealing with large files, if you put all the split chunks into a List<byte[]> before returning them, you will eventually consume a large amount of memory. By using yield return, the code reads and processes one block at a time only when the caller (the foreach loop in the Main method) requests the data. This makes it very memory-efficient.

3. Take(readSize).ToArray()

At the end of the reading loop (the end of the file), there may not be enough data to fill the buffer size (e.g., the 452-byte part in the execution result). Therefore, instead of returning the entire buffer, you need to cut out and return only the portion containing valid data.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次