[VBA] Detect File Encoding! How to Read BOM with ADODB.Stream

目次

Introduction

Have you ever experienced garbled text when importing external text files in VBA? One common cause is the file’s BOM (Byte Order Mark). A BOM is invisible identification data attached to the beginning of files saved in encodings like UTF-8.

By using the ADODB.Stream object in VBA, you can read a file directly as binary data and programmatically check if this BOM exists. This allows for more advanced file manipulation, such as preventing character encoding errors.

This article explains how to use ADODB.Stream to read the first few bytes of a file and verify if it contains a UTF-8 BOM.

Prerequisite

To use this code, you must enable a reference library. In the VBA Editor (VBE), go to Tools > References and check “Microsoft ActiveX Data Objects x.x Library” (version 6.1 is common).

VBA Sample Code to Detect BOM

This macro asks the user to select a file, reads the first 3 bytes, and determines if the byte data matches the UTF-8 BOM (hexadecimal EF BB BF).

' Select a file and check if its header is a UTF-8 BOM
Sub CheckForUtf8Bom()
    Dim fileStream As ADODB.Stream
    Dim targetFilePath As String
    Dim byteData() As Byte
    Dim hexCode As String
    
    '--- 1. Show file selection dialog ---
    targetFilePath = Application.GetOpenFilename("All Files,*.*")
    If targetFilePath = "False" Then Exit Sub ' Exit if cancelled
    
    '--- 2. Open file as binary using ADODB.Stream ---
    Set fileStream = New ADODB.Stream
    With fileStream
        .Type = adTypeBinary ' Binary Mode (Type=1)
        .Open
        .LoadFromFile targetFilePath
        
        '--- 3. Read the first 3 bytes ---
        byteData = .Read(3)
        
        .Close
    End With
    Set fileStream = Nothing
    
    '--- 4. Convert byte data to a hexadecimal string ---
    hexCode = Hex(byteData(0)) & " " & Hex(byteData(1)) & " " & Hex(byteData(2))
                   
    '--- 5. Check if it matches UTF-8 BOM ("EF BB BF") ---
    If hexCode = "EF BB BF" Then
        MsgBox "This file has a UTF-8 BOM attached.", vbInformation, "BOM Check Result"
    Else
        MsgBox "No UTF-8 BOM found in this file." & vbCrLf & _
               "First 3 bytes: " & hexCode, vbExclamation, "BOM Check Result"
    End If
End Sub

Code Explanation

ADODB.Stream Object

ADODB.Stream is a powerful object for handling data in files or memory as streams.

  • .Type = adTypeBinary: Sets the stream type to binary (value 1). This allows accurate reading of the file byte by byte.
  • .LoadFromFile targetFilePath: Loads the contents of the specified file into the stream.
  • .Read(3): Reads the specified number of bytes (3 in this case) from the current position in the stream and returns them as a Byte array.

Determining the BOM

  • hexCode = Hex(...): Converts each element of the byte array retrieved by .Read into a hexadecimal string using the Hex function and joins them with spaces.
  • If hexCode = "EF BB BF" Then: The UTF-8 BOM is defined as the 3-byte sequence EF BB BF. The code compares the converted string against this value to branch the logic.

Summary

In this article, we covered how to read file header information byte by byte using ADODB.Stream to detect a UTF-8 BOM.

  • Use ADODB.Stream to handle files as binary data.
  • Set .Type = adTypeBinary to enable binary mode.
  • Use .Read(number_of_bytes) to read a specific amount of data from the beginning of the file.

This technique is versatile. Beyond checking text file encoding, it can be applied to advanced file operations such as identifying image file formats by their signatures or parsing custom binary files.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次