[Python] Introduction to Requests: Basics of Website Retrieval and REST API Integration

When using Web scraping or REST APIs in Python, the third-party library requests is the de facto standard. It is overwhelmingly easier to handle than the standard library urllib.

It is characterized by being human-readable and allowing you to control HTTP communication with intuitive syntax. Here, I will explain everything from basic installation to the correspondence with HTTP methods, and provide implementation examples for practical API data retrieval.

目次

Library Installation

Since it is an external library, install it using the pip command.

pip install requests

Correspondence Table of HTTP Methods and Requests Functions

The main methods defined in the HTTP protocol are intuitively mapped as functions in the requests library.

HTTP MethodRequests FunctionMain Use
GETrequests.get()Retrieve resources (data/HTML) from the server.
POSTrequests.post()Send data to the server to create a new resource.
PUTrequests.put()Completely replace (update) an existing resource with the sent data.
DELETErequests.delete()Delete the specified resource.
PATCHrequests.patch()Modify (partially update) an existing resource.
HEADrequests.head()Retrieve only the header information without the response body.

Practical Code: Data Retrieval from REST API and Exception Handling

In modern development, it is frequent to retrieve data from JSON-formatted APIs rather than just fetching HTML from websites.

The following code is a complete script that sends a GET request to a test API service to retrieve and display user information. It includes not only simple retrieval but also “Timeout settings” and “Status code verification (Error handling),” which are essential in practice.

import requests
from requests.exceptions import RequestException, Timeout

def fetch_latest_posts():
    """
    Function to retrieve article data from a test API and display titles
    """
    # Test API endpoint (using JSONPlaceholder)
    target_url = "https://jsonplaceholder.typicode.com/posts"
    
    # Tell the server that this is a script (recommended as good etiquette)
    headers = {
        "User-Agent": "Python-Requests-Sample/1.0"
    }

    print(f"Requesting: {target_url} ...")

    try:
        # Sending GET request
        # Timeout setting is mandatory (in seconds). 
        # Without it, there is a risk of waiting indefinitely if there is no response.
        response = requests.get(target_url, headers=headers, timeout=10)

        # Raise an exception if the status code is 400s or 500s
        # This saves the trouble of writing branches like 'if response.status_code == 200:'
        response.raise_for_status()

        # Check response format
        # For APIs, it is common to parse as JSON instead of HTML (text)
        print(f"Status Code: {response.status_code}")
        print(f"Encoding: {response.encoding}")
        
        posts = response.json()
        
        # Display retrieved data (first 3 items only)
        print("-" * 40)
        for post in posts[:3]:
            print(f"ID: {post['id']}")
            print(f"Title: {post['title']}")
            print("-" * 40)

    except Timeout:
        print("Error: The response from the server timed out.")
    except RequestException as e:
        # Catch network errors, 404/500 errors, etc. collectively
        print(f"Communication error occurred: {e}")

if __name__ == "__main__":
    fetch_latest_posts()

Key Points of the Code

1. response.text and response.json()

If you want to retrieve the HTML of a website, use response.text to handle it as a string. However, when retrieving results from a REST API, use the response.json() method. This automatically converts the data into a Python dictionary (dict) or list (list), allowing for immediate data manipulation.

2. raise_for_status()

Even if the communication itself is successful, the server may return “Page not found (404)” or “Server error (500).” By calling response.raise_for_status(), you can force an exception (HTTPError) if the HTTP status code indicates an error. This clearly separates normal processing from error handling.

3. Setting a Timeout

Specifying timeout=10 (seconds) in the arguments of requests.get() is very important for building robust applications. This prevents the program from freezing in the event of a network failure.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次