Pandas is the de facto standard library for data analysis in Python. It allows you to handle table-like data, similar to Excel, flexibly and at high speeds within your programs. It is an essential tool for data preprocessing, aggregation, and preparing for visualization.
This article explains how to create the two core data structures of Pandas, “Series” and “DataFrame,” and how to extract data using position-based indexing with iloc.
Major Data Structures and Basic Operations in Pandas
The basic concepts you need to understand when working with Pandas are as follows:
- Series: A 1-dimensional array-like data structure. It can be thought of as a list with an index (labels).
- DataFrame: A 2-dimensional tabular data structure. It has rows and columns and can be interpreted as a bundle of multiple Series.
- Index / Columns: Row headers (Index) and column headers (Columns).
- iloc: A feature to extract data by specifying row or column numbers (Integer Location).
Below is the implementation code covering these concepts.
# Please execute `pip install pandas` in advance
import pandas as pd
def demonstrate_pandas_basics():
"""
Function to demonstrate basic data structures and referencing in Pandas
"""
print("=== 1. Creating a Series ===")
# Create a Series from a list
# Explicitly specify labels with the index argument
sales_list = [150, 200, 120]
dates = ["2023-01-01", "2023-01-02", "2023-01-03"]
series_data = pd.Series(sales_list, index=dates, name="Sales")
print("--- Created Series ---")
print(series_data)
print(f"Type: {type(series_data)}\n")
print("=== 2. Creating a DataFrame ===")
# Create a DataFrame from a dictionary
# Keys become column names (columns), values become data
shop_data = {
"Product_A": [100, 120, 150, 130],
"Product_B": [90, 80, 110, 95],
"Product_C": [200, 210, 205, 190]
}
# Create with specified row labels (index)
shop_index = ["Store_Tokyo", "Store_Osaka", "Store_Nagoya", "Store_Fukuoka"]
df = pd.DataFrame(shop_data, index=shop_index)
print("--- Created DataFrame ---")
print(df)
print("\n--- Checking Components ---")
print(f"Columns (Column Names): {df.columns.values}")
print(f"Index (Row Names): {df.index.values}\n")
print("=== 3. Data Extraction using iloc ===")
# iloc is accessed via [row_index, col_index] (0-based)
# Example: Get all data for the 1st row (Osaka)
row_1 = df.iloc[1]
print(f"--- Data for 1st row (Store_Osaka) ---\n{row_1}\n")
# Example: Get value of 2nd row (Nagoya), 0th column (Product_A)
val_2_0 = df.iloc[2, 0]
print(f"--- Value at 2nd row, 0th column (Store_Nagoya, Product_A) ---")
print(f"Value: {val_2_0}")
# Example: Range specification using slicing (rows 0 to 2, columns 1 to end)
subset = df.iloc[0:2, 1:]
print("\n--- Extraction by range (Rows 0:2, Cols 1:) ---")
print(subset)
if __name__ == "__main__":
demonstrate_pandas_basics()
Execution Result
=== 1. Creating a Series ===
--- Created Series ---
2023-01-01 150
2023-01-02 200
2023-01-03 120
Name: Sales, dtype: int64
Type: <class 'pandas.core.series.Series'>
=== 2. Creating a DataFrame ===
--- Created DataFrame ---
Product_A Product_B Product_C
Store_Tokyo 100 90 200
Store_Osaka 120 80 210
Store_Nagoya 150 110 205
Store_Fukuoka 130 95 190
--- Checking Components ---
Columns (Column Names): ['Product_A' 'Product_B' 'Product_C']
Index (Row Names): ['Store_Tokyo' 'Store_Osaka' 'Store_Nagoya' 'Store_Fukuoka']
=== 3. Data Extraction using iloc ===
--- Data for 1st row (Store_Osaka) ---
Product_A 120
Product_B 80
Product_C 210
Name: Store_Osaka, dtype: int64
--- Value at 2nd row, 0th column (Store_Nagoya, Product_A) ---
Value: 150
--- Extraction by range (Rows 0:2, Cols 1:) ---
Product_B Product_C
Store_Tokyo 90 200
Store_Osaka 80 210
Explanation: Relationship between Series and DataFrame
The data structure of Pandas is easier to understand if organized as follows:
- Series: Data aligned in a single column. It possesses an index.
- DataFrame: Multiple Series concatenated horizontally, sharing a common index (similar to a spreadsheet or a SQL table).
Integer Index Reference with iloc
In data analysis, there are many situations where you want to extract “data at a specific position.” The iloc property is used for this purpose.
- It is used in the format
df.iloc[row_index, col_index]. - Like standard Python lists, positions are specified using 0-based integers.
- By using slicing (
:), you can select ranges of rows or columns to create a partial DataFrame (subset).
First, mastering these basic operations will allow you to load data and check specific parts of it.
