[Linux] Identify File Types and Encoding with the file Command

目次

Overview

In Linux, the file command accurately identifies files by reading their “magic numbers” or header information rather than relying on file extensions. This tool is extremely useful for identifying “mysterious files” without extensions, files with incorrect extensions, or checking the character encoding of text files.

Specifications (Arguments and Options)

Syntax

file [options] filename

Main Arguments and Options

OptionDescription
-bBrief mode; outputs only the result without the filename.
-iOutputs MIME type strings (e.g., text/plain) instead of human-readable text.
-LFollows symbolic links and identifies the target file.
-zAttempts to look inside compressed files without decompressing them.
-sReads and identifies special files (e.g., block devices).
-F <char>Changes the separator between the filename and the result (default is :).
-kDisplays all possible matches instead of just the first one.

Basic Usage

When you run the command with a filename, it displays the file type.

Command

The following examples show how to identify text files, binary executables, and encrypted data.

# Identify a text file
file document.txt

# Identify a system command (binary)
file /usr/bin/ls

# Identify encrypted data
file secret.gpg

Execution Result

document.txt: ASCII text
/usr/bin/ls:  ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked...
secret.gpg:   GPG key public ring, created Fri Jan 17 10:00:00 2026

Practical Commands

Display MIME Type and Encoding

This is the most common format for automated scripts or checking web server configurations. Using the -i option provides a format like text/html; charset=utf-8.

# Display MIME types for all files in a directory
file -i *

Example Output:

image.png:    image/png; charset=binary
index.html:   text/html; charset=utf-8
script.py:    text/x-python; charset=us-ascii
archive.zip:  application/zip; charset=binary

Get Results Without the Filename

Use the -b option when you want to assign the output to a variable in a shell script without the filename prefix.

# Show only the type for data.bin
file -b data.bin

Inspect Contents of Compressed Files

For files compressed with tools like gzip, this option attempts to identify the original file type without decompressing it first.

# Check the content of a compressed archive
file -z backup.tar.gz

Customization Points

  • Changing Separators: If your filenames contain colons, use -F "=>" to change the separator and make the output easier to parse.
  • Encoding Detection: The charset info in file -i helps investigate garbled text. However, for complex encodings beyond UTF-8 or ASCII, specialized tools like nkf may be more accurate.
  • Symbolic Links: By default, file only reports that a file is a symlink. Use -L to find out what the actual target file is.

Important Notes

  • Detection is Not 100%: The command relies on “magic numbers” at the start of the file. Corrupted headers or featureless binary data may simply be identified as data.
  • Inconsistency with Extensions: In Linux, extensions do not dictate behavior. A file named image.jpg that contains only text will be correctly identified as ASCII text by the file command.
  • Empty Files: If a file has a size of 0, the output will be empty.

Applications

Filtering and Processing Specific File Types

You can use the output of the file command to perform actions like moving only image files to a different directory.

# Display only files identified as "image"
for f in *; do
  if file -b --mime-type "$f" | grep -q "^image/"; then
    echo "$f is an image file."
  fi
done

Summary

The file command is a fundamental tool for uncovering the true nature of files when extensions cannot be trusted. Its ability to verify MIME types is particularly valuable in web development and server management. It is a good habit to use file before running commands like cat on unknown files to avoid accidentally filling your terminal with binary gibberish.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次