[Python] 4 Essential Libraries for Data Analysis: Roles and Integration of IPython, NumPy, pandas, and Matplotlib

The biggest reason Python is used as the standard language in the fields of data analysis and machine learning lies in its powerful ecosystem of libraries.

The four tools mentioned in the question do not exist in isolation; they work in a coordinated workflow: “Trial and error with IPython, calculation with NumPy, data organization with pandas, and visualization with Matplotlib.”

Here, I will introduce the relationship between these four tools and provide code that actually combines them to perform simple data analysis.

目次

Library Installation

When performing data analysis, it is common to use the “Anaconda” distribution, which includes all of these, or to install them all at once using the following command.

pip install numpy pandas matplotlib ipython jupyter

Note: IPython is rarely used on its own nowadays; it is mostly used as the kernel (execution engine) running behind Jupyter Notebook (or Jupyter Lab).

Executable Sample Code

The following code reproduces a basic data analysis workflow: generating random number data with NumPy, processing it into a table with dates using pandas, and drawing a graph with Matplotlib.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def data_analysis_demo():
    print("=== 1. NumPy: Numerical Calculation ===")
    # Fix the random seed (to reproduce the same result every time)
    np.random.seed(42)
    
    # Generate 100 random numbers following a standard normal distribution and take the cumulative sum
    # This creates random walk data resembling "stock price movements"
    random_data = np.random.randn(100).cumsum()
    print(f"Generated Data (first 5): {random_data[:5]}")

    print("\n=== 2. pandas: Data Structuring ===")
    # Create a date index (100 days starting from 2025-01-01)
    dates = pd.date_range("2025-01-01", periods=100)
    
    # Create a "DataFrame (table)" by combining the NumPy array and dates
    df = pd.DataFrame(random_data, index=dates, columns=["Value"])
    
    # Display the first few rows of the data
    print("DataFrame Head:")
    print(df.head())
    
    # Check basic statistics (mean, max, min, etc.)
    print("\nStatistics:")
    print(df.describe())

    print("\n=== 3. Matplotlib: Visualization ===")
    # Set the size of the graph
    plt.figure(figsize=(10, 5))
    
    # Plot the data
    # (pandas has wrapper functions for matplotlib, but we write it mainly here for demonstration)
    plt.plot(df.index, df["Value"], label="Random Trend", color="blue")
    
    # Set title and labels
    plt.title("Sample Data Analysis Workflow")
    plt.xlabel("Date")
    plt.ylabel("Value")
    
    # Display grid lines and legend
    plt.grid(True)
    plt.legend()
    
    # Save the graph (or use plt.show() to display it)
    output_file = "analysis_result.png"
    plt.savefig(output_file)
    print(f"Graph saved as: {output_file}")

    # Use plt.show() to display the window in some environments
    # plt.show()

if __name__ == "__main__":
    data_analysis_demo()

Roles and Integration of Each Library

1. IPython (Interactive Python)

A powerful interactive shell that supersedes the standard Python shell.

  • Role: Accelerates the “trial and error” process of writing code, running it immediately, and seeing the results.
  • Features: Tab completion, history functions, and execution of OS commands. It is currently indispensable as the engine running behind Jupyter Notebook.

2. NumPy (Numerical Python)

The foundational library for performing numerical calculations in Python.

  • Role: High-speed vector and matrix arithmetic.
  • Features: Since Python’s standard lists are slow for calculations, NumPy arrays (ndarray) are always used when handling large amounts of numbers. Both pandas and Matplotlib use NumPy data formats internally.

3. pandas

A library to support data analysis.

  • Role: Data loading (CSV, Excel), processing, and aggregation.
  • Features: Provides a data structure called DataFrame, which has “rows and columns” like Excel. It can easily handle “labeled data (dates and item names)” and “missing values,” which are difficult to handle with NumPy alone.

4. Matplotlib

A graph plotting library.

  • Role: Visualization of data.
  • Features: Can draw all kinds of graphs, including line graphs, histograms, and scatter plots. While extremely detailed customization is possible, the code tends to become long, so it is often used in combination with a wrapper library called Seaborn in recent years.
よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次