[Python] How to Parse URLs to Extract Components (Domain, Path, Query, etc.)

2025年12月22日

In web scraping or API integration processes, you often need to extract specific parts from a long URL string, such as just the “domain name” or only the “query parameters.”

By using the urlparse() function from Python’s standard library urllib.parse module, you can easily split (parse) a URL into its six components.

Main Attributes of the ParseResult Object

The object returned by the urlparse() function (ParseResult) has the following main attributes:

Attribute Name	Description	Example of Extracted Part
`scheme`	Protocol (Scheme)	`https`, `http`
`netloc`	Network Location (Domain/Host name)	`www.example.com`, `api.server:8080`
`path`	File path under the domain	`/articles/search`, `/index.html`
`query`	Query string (Parameters)	`q=python&page=1`

Implementation Example: Parsing a Search URL

In this example, we will decompose the components of a URL from a fictional real estate search site and display them individually.

Source Code

from urllib import parse

# URL to parse (Fictional property search URL)
# Structure: Protocol://Domain/Path?QueryParameters
property_search_url = "https://realestate.example.com/rent/tokyo/search?min_price=50000&max_price=80000&layout=1K"

# 1. Parse the URL
# urlparse(url_string) returns a ParseResult object
parsed_data = parse.urlparse(property_search_url)

# 2. Check the entire parsed result
print(f"Parsed Object: {parsed_data}")
print("-" * 40)

# 3. Access each attribute to retrieve values
print(f"Protocol (scheme) : {parsed_data.scheme}")
print(f"Domain (netloc)   : {parsed_data.netloc}")
print(f"Path (path)       : {parsed_data.path}")
print(f"Query (query)     : {parsed_data.query}")

# Bonus: Converting query parameters into a dictionary
# parse_qs converts the query string into a format like {'min_price': ['50000'], ...}
query_dict = parse.parse_qs(parsed_data.query)
print("-" * 40)
print(f"Query Dictionary  : {query_dict}")

Execution Result

Parsed Object: ParseResult(scheme='https', netloc='realestate.example.com', path='/rent/tokyo/search', params='', query='min_price=50000&max_price=80000&layout=1K', fragment='')
----------------------------------------
Protocol (scheme) : https
Domain (netloc)   : realestate.example.com
Path (path)       : /rent/tokyo/search
Query (query)     : min_price=50000&max_price=80000&layout=1K
----------------------------------------
Query Dictionary  : {'min_price': ['50000'], 'max_price': ['80000'], 'layout': ['1K']}

Explanation

urlparse()

This is a function for structurally decomposing a URL. The return value is an instance of the ParseResult class, which behaves like a tuple. You can access values using attribute names (like .scheme) or indices (like [0]).

Use Cases

It is frequently used for tasks such as checking the domain of a redirect destination, extracting a filename from an image URL, or modifying parameters for an API request.

Important Note

If you pass an incomplete URL string (e.g., example.com/foo without https://), it may not be parsed correctly. In such cases, the scheme might be empty, or the entire string might be interpreted as the path.

よかったらシェアしてね！