Extracting specific columns from a Pandas DataFrame for analysis, or updating column values based on calculation results, are fundamental operations in data processing.
This article explains how to retrieve column data using bracket notation [] and dot notation ., how to access specific elements, and how to overwrite and update data for an entire column.
Basics of Retrieving and Updating Column Data
There are two main ways to access columns in a DataFrame.
- Bracket Notation:
df["column_name"]This is the most standard method. It can be used even if the column name contains spaces or symbols. - Dot Notation:
df.column_nameThis is a concise way to write code, available only when the column name satisfies Python’s variable naming rules (no spaces, does not start with a number, etc.).
Additionally, by assigning a new list or Series to the extracted column, you can update the data for that column all at once.
Implementation Sample Code
Below is code using a fruit shop’s inventory list as an example. We will manipulate a DataFrame containing Price and Stock information.
import pandas as pd
def manage_column_data():
"""
Function demonstrating DataFrame column access, element reference, and column data updates
"""
# 1. Data Preparation
# Create a DataFrame with fruit names as the index, and Price/Stock as columns
fruit_names = ["Apple", "Banana", "Orange", "Grape", "Peach"]
inventory_data = {
"Price": [150, 100, 120, 400, 300],
"Stock": [50, 80, 60, 20, 15]
}
df = pd.DataFrame(inventory_data, index=fruit_names)
print("--- Initial DataFrame ---")
print(df)
print("\n")
# 2. Retrieving Column Data
print("=== Retrieving Column Data ===")
# Pattern A: Bracket Notation (Recommended)
# Retrieved as a Series
price_series = df["Price"]
print("--- Price Column (Bracket Notation) ---")
print(price_series)
# Pattern B: Dot Notation
# Code is shorter, but may not be usable depending on the column name
stock_series = df.Stock
print("\n--- Stock Column (Dot Notation) ---")
print(stock_series)
# 3. Accessing Specific Elements (Cells)
print("\n=== Accessing Specific Elements ===")
# Get the value for "Banana" in the Stock column
# Access the Series (df.Stock) using the index (.Banana)
banana_stock = df.Stock.Banana
print(f"Stock count for Banana: {banana_stock}")
# 4. Updating Column Data (Overwriting)
print("\n=== Updating Column Data ===")
# Update the 'Price' column with a new price list
# The length of the list must match the number of rows in the DataFrame
new_prices = [160, 110, 130, 420, 320]
# It is also possible to create a Pandas Series and assign it by explicitly specifying the index
# By matching indices, data is mapped correctly even if the order differs
new_price_series = pd.Series(new_prices, index=fruit_names)
df["Price"] = new_price_series
print("--- DataFrame after Price Revision ---")
print(df)
if __name__ == "__main__":
manage_column_data()
Execution Result
--- Initial DataFrame ---
Price Stock
Apple 150 50
Banana 100 80
Orange 120 60
Grape 400 20
Peach 300 15
=== Retrieving Column Data ===
--- Price Column (Bracket Notation) ---
Apple 150
Banana 100
Orange 120
Grape 400
Peach 300
Name: Price, dtype: int64
--- Stock Column (Dot Notation) ---
Apple 50
Banana 80
Orange 60
Grape 20
Peach 15
Name: Stock, dtype: int64
=== Accessing Specific Elements ===
Stock count for Banana: 80
=== Updating Column Data ===
--- DataFrame after Price Revision ---
Price Stock
Apple 160 50
Banana 110 80
Orange 130 60
Grape 420 20
Peach 320 15
Explanation and Important Points
Bracket Notation vs. Dot Notation
While dot notation (df.Price) is convenient, bracket notation (df["Price"]) is mandatory in the following cases:
- When the column name contains spaces: (e.g.,
"Unit Price") - When the column name is a Python reserved word or method name: (e.g.,
"class","sum","count") - When specifying the column name dynamically using a variable: (e.g.,
col = "Price"; df[col])
Benefits of Updating with Series
When updating a column, you can assign a simple list (array). However, assigning a pd.Series has the advantage of automatic adjustment via the index.
If the index of the Series matches the index of the DataFrame, the correct values will be assigned to the correct rows without worrying about the order of the data.
