In practical data analysis, text files like CSV (Comma-Separated Values) and TSV (Tab-Separated Values) are the most widely used formats for exchanging data. Pandas provides features to load these files into a DataFrame with very simple code, as well as to save DataFrames as files.
This article explains flexible data reading methods using the read_csv function and file output control using the to_csv method, provided with specific code examples.
Reading CSV/TSV Files (read_csv)
Pandas’ pd.read_csv() is a function for loading data from local files or URLs. While it assumes comma-separated files by default, you can handle tab-separated files (TSV) or files without headers by changing the arguments.
Key Arguments
- filepath_or_buffer: The file path or URL.
- sep: The delimiter character. The default is a comma
,. Specify\tfor TSV. - header: The position of the header row. Specify
Noneif there is no header. - dtype: Specifies the data type for each column using a dictionary. This is effective for saving memory and preventing type misidentification.
Writing to CSV/TSV Files (to_csv)
To save the contents of a DataFrame to a file, use the df.to_csv() method.
Key Arguments
- path_or_buf: The output file path.
- index: Whether to output row labels (index) to the file. Setting this to
Falseremoves the index column. - sep: The delimiter character.
- index_label: Specifies the column name when outputting the index.
Implementation Sample Code
The following code demonstrates the procedure of creating a dummy CSV file for verification, reading it with various settings, and then outputting the processed data in a different format. Here, we use a system’s “user access log” as the subject matter.
Python
import pandas as pd
import numpy as np
import os
def process_log_data():
"""
CSV/TSVファイルの読み書き(入出力)を実演する関数
"""
# ---------------------------------------------------------
# 1. 準備: 動作確認用のサンプルファイルを作成
# ---------------------------------------------------------
csv_filename = "sample_access_log.csv"
tsv_filename = "sample_no_header.tsv"
# 標準的なCSVデータ(ヘッダーあり)
csv_content = """user_id,access_time,response_time
1001,2026-05-01 10:00:00,0.45
1002,2026-05-01 10:05:00,1.20
1003,2026-05-01 10:10:00,0.08"""
# TSVデータ(ヘッダーなし、タブ区切り)
tsv_content = "2001\tLogin_Success\n2002\tLogin_Failed\n2003\tLogout"
with open(csv_filename, "w", encoding="utf-8") as f:
f.write(csv_content)
with open(tsv_filename, "w", encoding="utf-8") as f:
f.write(tsv_content)
print(f"--- 準備完了: {csv_filename}, {tsv_filename} を作成しました ---\n")
# ---------------------------------------------------------
# 2. ファイルの読み込み (read_csv)
# ---------------------------------------------------------
print("=== CSVファイルの読み込み ===")
# 基本的な読み込み(型指定あり)
# user_idは整数、response_timeは浮動小数点数として読み込む
df_csv = pd.read_csv(
csv_filename,
dtype={
"user_id": np.int64,
"response_time": np.float64
}
)
print(df_csv)
print(f"dtypes:\n{df_csv.dtypes}\n")
print("=== TSVファイルの読み込み(ヘッダーなし) ===")
# sep="\t" でタブ区切りを指定
# header=None でヘッダー行がないことを明示
# names引数で列名を後付けで付与
df_tsv = pd.read_csv(
tsv_filename,
sep="\t",
header=None,
names=["session_id", "action_type"]
)
print(df_tsv)
print("\n")
# ---------------------------------------------------------
# 3. ファイルへの書き込み (to_csv)
# ---------------------------------------------------------
print("=== CSVファイルへの出力 ===")
output_csv = "processed_log.csv"
# index=False にすることで、Pandasの行番号(0, 1, 2...)を出力しない
df_csv.to_csv(output_csv, index=False)
print(f"保存完了: {output_csv} (インデックスなし)")
print("=== TSVファイルへの出力 ===")
output_tsv = "processed_log.tsv"
# sep="\t" でTSV形式として保存
# index=True でインデックスも保存し、index_labelでその列名を設定
df_tsv.to_csv(
output_tsv,
sep="\t",
index=True,
index_label="row_number"
)
print(f"保存完了: {output_tsv} (インデックスあり, ラベル='row_number')")
# (参考) 作成したファイルを削除する場合は以下を有効化してください
# os.remove(csv_filename)
# os.remove(tsv_filename)
# os.remove(output_csv)
# os.remove(output_tsv)
if __name__ == "__main__":
process_log_data()
Execution Result
Plaintext
--- 準備完了: sample_access_log.csv, sample_no_header.tsv を作成しました ---
=== CSVファイルの読み込み ===
user_id access_time response_time
0 1001 2026-05-01 10:00:00 0.45
1 1002 2026-05-01 10:05:00 1.20
2 1003 2026-05-01 10:10:00 0.08
dtypes:
user_id int64
access_time object
response_time float64
dtype: object
=== TSVファイルの読み込み(ヘッダーなし) ===
session_id action_type
0 2001 Login_Success
1 2002 Login_Failed
2 2003 Logout
=== CSVファイルへの出力 ===
保存完了: processed_log.csv (インデックスなし)
=== TSVファイルへの出力 ===
保存完了: processed_log.tsv (インデックスあり, ラベル='row_number')
Notes and Supplements
- Delimiter: When handling formats other than CSV (such as tabs or pipes
|), always specify thesepargument. - Index Output: The default for
to_csvisindex=True. To prevent unnecessary row numbers from being saved, it is recommended to specifyindex=Falsefor general data storage. - Encoding: When handling files containing non-ASCII characters (e.g., Japanese), you may need to explicitly specify
encoding="utf-8"or other encodings appropriate for your environment (e.g.,pd.read_csv("file.csv", encoding="utf-8")).
