[Python] Implementing Parallel Processing with multiprocessing.Process

目次

Overview

When you need to parallelize CPU-intensive tasks (such as heavy calculations) in Python, using multiprocessing is more effective than threading.

Since each process has its own independent memory space and Python interpreter, it is not restricted by the Global Interpreter Lock (GIL). This allows you to fully utilize the performance of multi-core CPUs.

In this article, we will explain the basics of process creation, passing arguments, waiting for completion (join), and daemon processes.

Specifications (Input/Output)

Parameters for multiprocessing.Process

ParameterTypeMeaning
targetcallableThe function object to be executed when the process starts.
argstuplePositional arguments to pass to the target function. A comma is required for a single element (e.g., (1,)).
kwargsdictA dictionary of keyword arguments to pass to the target function.
daemonboolIf set to True, the process becomes a daemon process and is forced to exit when the main process ends.

Main Methods

MethodDescription
start()Spawns the process and begins execution of the target function.
join(timeout=None)Waits for the process to exit. Without this, the main process might finish before the sub-processes.

Basic Usage

Define a function and specify it as the target when creating a Process instance. Use start() to begin and join() to wait for completion.

# Basic form passing arguments as a tuple
p = multiprocessing.Process(target=my_func, args=("value1",))
p.start()
p.join()

Full Code Example

In this example, two separate processes execute different functions: “counting numbers” and “printing characters.” This code demonstrates how to pass arguments using both args (positional) and kwargs (keyword).

import multiprocessing
import time
import os

def print_numbers(process_name: str, limit: int):
    """
    Function that prints numbers a specified number of times.
    """
    print(f"[{process_name}] PID: {os.getpid()} Started")
    for i in range(limit):
        print(f"  {process_name}: {i}")
        time.sleep(0.5)
    print(f"[{process_name}] Finished")

def print_letters(process_name: str, letters: list):
    """
    Function that prints letters from a list sequentially.
    """
    print(f"[{process_name}] PID: {os.getpid()} Started")
    for char in letters:
        print(f"  {process_name}: {char}")
        time.sleep(0.7)
    print(f"[{process_name}] Finished")

def main():
    print(f"Main Process PID: {os.getpid()} Started")

    # Process 1: Using args (positional argument tuple)
    # Note: Even with a single element, a comma is required like (val, )
    p1 = multiprocessing.Process(
        target=print_numbers,
        args=("NumProc", 3)
    )

    # Process 2: Using kwargs (keyword argument dictionary)
    p2 = multiprocessing.Process(
        target=print_letters,
        kwargs={"process_name": "CharProc", "letters": ["A", "B", "C"]}
    )

    # Start processes
    p1.start()
    p2.start()

    print("--- Processes are running ---")

    # Wait for processes to finish
    p1.join()
    p2.join()

    print("All processes have finished.")

if __name__ == "__main__":
    # This guard block is mandatory on Windows and macOS
    main()

Example Output

Since the two processes run concurrently, the output will be mixed. Notice that the PIDs (Process IDs) are different for each.

Main Process PID: 12345 Started
--- Processes are running ---
[NumProc] PID: 12346 Started
[CharProc] PID: 12347 Started
  NumProc: 0
  CharProc: A
  NumProc: 1
  CharProc: B
  NumProc: 2
[NumProc] Finished
  CharProc: C
[CharProc] Finished
All processes have finished.

Customization Points

  • Passing Arguments:
    • args=("hoge",): When passing a single argument, remember the trailing comma to ensure it is treated as a tuple.
    • kwargs={"key": "value"}: This is recommended for better readability when you have many arguments or want to override default values.
  • Managing Process Lists: When launching many processes, it is common to store them in a list (processes = []) and use loops to call start() and join().

Important Notes

  • Necessity of if __name__ == "__main__":: On Windows and macOS, the entire module is re-imported when spawning a new process. Without this guard block, processes will be created infinitely (recursive explosion), causing an error.
  • Independent Memory Space: Modifying a global variable only affects the “copy” inside that specific process. To share data between processes, you must use specific features like Queue or Value.
  • Zombie Processes: If a parent process continues running for a long time without calling join(), finished child processes may remain in the system as “zombie” processes. Ensure you call join() or use a context manager.

Advanced Application

Here is an example using a daemon process (daemon=True). A daemon process is forced to terminate when the main process ends, even if it is still working. This is useful for background tasks like “log monitoring” or “health checks.”

import multiprocessing
import time

def background_task():
    print("Background task started")
    while True:
        print("  ...working")
        time.sleep(0.5)

def main_daemon():
    # Set daemon=True
    p = multiprocessing.Process(target=background_task, daemon=True)
    p.start()

    print("Main process running (for 2 seconds)...")
    time.sleep(2.0)
    
    print("Main process ending. The daemon will also be terminated.")
    # Exits without calling p.join()

if __name__ == "__main__":
    main_daemon()

Conclusion

multiprocessing.Process is the most fundamental class for parallel computing in Python.

Caution: Creating a process is more expensive than creating a thread. It is not suitable for launching thousands of lightweight tasks that finish in milliseconds (in that case, consider using Pool).Understanding the difference from threading (memory independence) and using it correctly will help improve the performance of your Python programs.

Best for: CPU-intensive calculations with high independence and background tasks with different lifecycles than the main process.

Key Points: Pay attention to the trailing comma in args and always include the if __name__ == "__main__": block.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次