How to Get File Extensions in Python Using os.path.splitext

In file manipulation programs, it is often necessary to branch processing based on file extensions. Common scenarios include “processing only image files (.jpg, .png)” or “excluding anything other than text files (.txt)”.

The Python standard library os.path module provides the splitext() function to easily extract the extension part from a file path.

This article explains the basic usage of os.path.splitext() and its behavior in special cases, such as filenames with multiple dots or files without extensions.

目次

Basics of os.path.splitext()

os.path.splitext() splits the path string passed as an argument into two parts: “the part without the extension (root)” and “the extension (ext)”. It returns these as a tuple.

Syntax:

import os

root, ext = os.path.splitext(path_string)

The important point is that the retrieved extension ext includes the dot (.).

Specific Usage Example

Here is an example of extracting the extension from an image file path to determine if it should be processed.

import os

# Target file path
image_path = "uploads/photos/sunset.jpg"

# Split the path
root_part, extension = os.path.splitext(image_path)

print(f"Original path: {image_path}")
print(f"Root part: {root_part}")
print(f"Extension part: {extension}")

# Check extension (usually converted to lowercase for comparison)
if extension.lower() == ".jpg":
    print("JPEG image detected.")

Execution Result:

Original path: uploads/photos/sunset.jpg
Root part: uploads/photos/sunset
Extension part: .jpg
JPEG image detected.

The root part contains the “drive letter, directory path, and filename (without extension),” while the extension part contains the “extension including the last dot.”

Behavior in Special Cases

Understanding behavior in cases where filenames contain multiple dots or no extension is important to prevent bugs.

1. Filenames with Multiple Dots (e.g., archive.tar.gz)

splitext() recognizes the very last dot as the separator. Therefore, for double extensions like .tar.gz, only .gz is treated as the extension.

import os

# File with double extension
archive_path = "backup.tar.gz"

root, ext = os.path.splitext(archive_path)

print(f"File: {archive_path}")
print(f"Root: {root}")
print(f"Extension: {ext}")

Execution Result:

File: backup.tar.gz
Root: backup.tar
Extension: .gz

2. Files Without Extensions (e.g., README)

If there is no dot, the extension part becomes an empty string ''.

text_file = "README"
root, ext = os.path.splitext(text_file)

print(f"Extension: '{ext}'") # Empty string

Execution Result:

Extension: ''

3. Files Starting with a Dot (e.g., .gitignore)

For “dotfiles” common in Linux and macOS configuration files, the leading dot is considered part of the filename, not an extension separator. The extension becomes an empty string (unless there is another dot elsewhere in the name).

config_file = ".gitignore"
root, ext = os.path.splitext(config_file)

print(f"Root: {root}")
print(f"Extension: '{ext}'")

Execution Result:

Root: .gitignore
Extension: ''

Summary

  • os.path.splitext(path): Splits the path into (root, ext).
  • Extension ext: Includes the dot (e.g., .jpg).
  • Caution when checking: Must write if ext == ".jpg": instead of if ext == "jpg":.
  • Multiple extensions: Only the part after the last dot is treated as the extension.

This function is extremely useful for classifying file types or generating output filenames (such as saving a file with a different extension).

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次