In web scraping or API integration processes, you often need to extract specific parts from a long URL string, such as just the “domain name” or only the “query parameters.”
By using the urlparse() function from Python’s standard library urllib.parse module, you can easily split (parse) a URL into its six components.
Main Attributes of the ParseResult Object
The object returned by the urlparse() function (ParseResult) has the following main attributes:
| Attribute Name | Description | Example of Extracted Part |
scheme | Protocol (Scheme) | https, http |
netloc | Network Location (Domain/Host name) | www.example.com, api.server:8080 |
path | File path under the domain | /articles/search, /index.html |
query | Query string (Parameters) | q=python&page=1 |
Implementation Example: Parsing a Search URL
In this example, we will decompose the components of a URL from a fictional real estate search site and display them individually.
Source Code
from urllib import parse
# URL to parse (Fictional property search URL)
# Structure: Protocol://Domain/Path?QueryParameters
property_search_url = "https://realestate.example.com/rent/tokyo/search?min_price=50000&max_price=80000&layout=1K"
# 1. Parse the URL
# urlparse(url_string) returns a ParseResult object
parsed_data = parse.urlparse(property_search_url)
# 2. Check the entire parsed result
print(f"Parsed Object: {parsed_data}")
print("-" * 40)
# 3. Access each attribute to retrieve values
print(f"Protocol (scheme) : {parsed_data.scheme}")
print(f"Domain (netloc) : {parsed_data.netloc}")
print(f"Path (path) : {parsed_data.path}")
print(f"Query (query) : {parsed_data.query}")
# Bonus: Converting query parameters into a dictionary
# parse_qs converts the query string into a format like {'min_price': ['50000'], ...}
query_dict = parse.parse_qs(parsed_data.query)
print("-" * 40)
print(f"Query Dictionary : {query_dict}")
Execution Result
Parsed Object: ParseResult(scheme='https', netloc='realestate.example.com', path='/rent/tokyo/search', params='', query='min_price=50000&max_price=80000&layout=1K', fragment='')
----------------------------------------
Protocol (scheme) : https
Domain (netloc) : realestate.example.com
Path (path) : /rent/tokyo/search
Query (query) : min_price=50000&max_price=80000&layout=1K
----------------------------------------
Query Dictionary : {'min_price': ['50000'], 'max_price': ['80000'], 'layout': ['1K']}
Explanation
urlparse()
This is a function for structurally decomposing a URL. The return value is an instance of the ParseResult class, which behaves like a tuple. You can access values using attribute names (like .scheme) or indices (like [0]).
Use Cases
It is frequently used for tasks such as checking the domain of a redirect destination, extracting a filename from an image URL, or modifying parameters for an API request.
Important Note
If you pass an incomplete URL string (e.g., example.com/foo without https://), it may not be parsed correctly. In such cases, the scheme might be empty, or the entire string might be interpreted as the path.
