When using Web scraping or REST APIs in Python, the third-party library requests is the de facto standard. It is overwhelmingly easier to handle than the standard library urllib.
It is characterized by being human-readable and allowing you to control HTTP communication with intuitive syntax. Here, I will explain everything from basic installation to the correspondence with HTTP methods, and provide implementation examples for practical API data retrieval.
Library Installation
Since it is an external library, install it using the pip command.
pip install requests
Correspondence Table of HTTP Methods and Requests Functions
The main methods defined in the HTTP protocol are intuitively mapped as functions in the requests library.
| HTTP Method | Requests Function | Main Use |
| GET | requests.get() | Retrieve resources (data/HTML) from the server. |
| POST | requests.post() | Send data to the server to create a new resource. |
| PUT | requests.put() | Completely replace (update) an existing resource with the sent data. |
| DELETE | requests.delete() | Delete the specified resource. |
| PATCH | requests.patch() | Modify (partially update) an existing resource. |
| HEAD | requests.head() | Retrieve only the header information without the response body. |
Practical Code: Data Retrieval from REST API and Exception Handling
In modern development, it is frequent to retrieve data from JSON-formatted APIs rather than just fetching HTML from websites.
The following code is a complete script that sends a GET request to a test API service to retrieve and display user information. It includes not only simple retrieval but also “Timeout settings” and “Status code verification (Error handling),” which are essential in practice.
import requests
from requests.exceptions import RequestException, Timeout
def fetch_latest_posts():
"""
Function to retrieve article data from a test API and display titles
"""
# Test API endpoint (using JSONPlaceholder)
target_url = "https://jsonplaceholder.typicode.com/posts"
# Tell the server that this is a script (recommended as good etiquette)
headers = {
"User-Agent": "Python-Requests-Sample/1.0"
}
print(f"Requesting: {target_url} ...")
try:
# Sending GET request
# Timeout setting is mandatory (in seconds).
# Without it, there is a risk of waiting indefinitely if there is no response.
response = requests.get(target_url, headers=headers, timeout=10)
# Raise an exception if the status code is 400s or 500s
# This saves the trouble of writing branches like 'if response.status_code == 200:'
response.raise_for_status()
# Check response format
# For APIs, it is common to parse as JSON instead of HTML (text)
print(f"Status Code: {response.status_code}")
print(f"Encoding: {response.encoding}")
posts = response.json()
# Display retrieved data (first 3 items only)
print("-" * 40)
for post in posts[:3]:
print(f"ID: {post['id']}")
print(f"Title: {post['title']}")
print("-" * 40)
except Timeout:
print("Error: The response from the server timed out.")
except RequestException as e:
# Catch network errors, 404/500 errors, etc. collectively
print(f"Communication error occurred: {e}")
if __name__ == "__main__":
fetch_latest_posts()
Key Points of the Code
1. response.text and response.json()
If you want to retrieve the HTML of a website, use response.text to handle it as a string. However, when retrieving results from a REST API, use the response.json() method. This automatically converts the data into a Python dictionary (dict) or list (list), allowing for immediate data manipulation.
2. raise_for_status()
Even if the communication itself is successful, the server may return “Page not found (404)” or “Server error (500).” By calling response.raise_for_status(), you can force an exception (HTTPError) if the HTTP status code indicates an error. This clearly separates normal processing from error handling.
3. Setting a Timeout
Specifying timeout=10 (seconds) in the arguments of requests.get() is very important for building robust applications. This prevents the program from freezing in the event of a network failure.
