In Pandas DataFrame operations, extracting data row by row is just as frequent as retrieving column data. When you need to extract records with specific IDs or access data by its n-th position, you use the loc and iloc properties.
This article explains the differences between these two accessors, how to use them, and how to update row data.
Differences Between loc and iloc
There are two main ways to specify row data in Pandas:
loc["label"]: Access by specifying the index name (row label).iloc[integer]: Access by specifying the row number (integer starting from 0).
Implementation Sample Code
Below is code that creates a DataFrame to manage student test scores, then retrieves and updates data for a specific student.
import pandas as pd
def manage_row_data():
"""
Function demonstrating retrieval and updating of row data using loc and iloc
"""
# 1. Preparation of Data
# DataFrame with student names as index, and Math/English scores as columns
student_names = ["Alice", "Bob", "Charlie", "David", "Ellen"]
exam_data = {
"Math": [75, 82, 90, 68, 95],
"English": [88, 79, 85, 92, 70]
}
# Create DataFrame specifying index by name
df = pd.DataFrame(exam_data, index=student_names)
print("--- Initial Exam Data ---")
print(df)
print("\n")
# 2. Retrieving row using loc (Label Specification)
print("=== Row Retrieval via loc (Label) ===")
# Get row data for "Charlie"
# Return value is a Series
charlie_data = df.loc["Charlie"]
print("--- Charlie's Data ---")
print(charlie_data)
print(f"Type: {type(charlie_data)}\n")
# 3. Retrieving row using iloc (Position Specification)
print("=== Row Retrieval via iloc (Position) ===")
# Get the 4th row from the top (Index 3) -> David
row_at_3 = df.iloc[3]
print("--- 4th Data (David) ---")
print(row_at_3)
# Extract value of specific column from retrieved Series
# Dot notation or bracket notation can be used
david_math = row_at_3.Math
print(f">> David's Math Score: {david_math}\n")
# 4. Updating Row Data
print("=== Updating Row Data ===")
# Create a new Series to correct Alice's scores
# Index (column names) must match the DataFrame
new_alice_scores = pd.Series([80, 95], index=["Math", "English"])
# Overwrite Alice's entire row using loc
df.loc["Alice"] = new_alice_scores
print("--- Data After Update (Alice's scores changed) ---")
print(df)
if __name__ == "__main__":
manage_row_data()
Execution Result
--- Initial Exam Data ---
Math English
Alice 75 88
Bob 82 79
Charlie 90 85
David 68 92
Ellen 95 70
=== Row Retrieval via loc (Label) ===
--- Charlie's Data ---
Math 90
English 85
Name: Charlie, dtype: int64
Type: <class 'pandas.core.series.Series'>
=== Row Retrieval via iloc (Position) ===
--- 4th Data (David) ---
Math 68
English 92
Name: David, dtype: int64
>> David's Math Score: 68
=== Updating Row Data ===
--- Data After Update (Alice's scores changed) ---
Math English
Alice 80 95
Bob 82 79
Charlie 90 85
David 68 92
Ellen 95 70
Explanation
Return Value is a Series
When you retrieve only one row using loc or iloc, the return value becomes a Series object where the column names serve as the index. Therefore, you can easily access individual values using syntax like row_data.ColumnName.
Updating Rows
When updating a row, the shape of the new data (list or Series) must match the column structure of the DataFrame. While it is possible to assign a list directly like df.loc["Label"] = [Val1, Val2], using pd.Series clarifies the correspondence of labels (column names) and allows for safer updates.
