[Python] How to Parse URLs to Extract Components (Domain, Path, Query, etc.)

In web scraping or API integration processes, you often need to extract specific parts from a long URL string, such as just the “domain name” or only the “query parameters.”

By using the urlparse() function from Python’s standard library urllib.parse module, you can easily split (parse) a URL into its six components.

目次

Main Attributes of the ParseResult Object

The object returned by the urlparse() function (ParseResult) has the following main attributes:

Attribute NameDescriptionExample of Extracted Part
schemeProtocol (Scheme)https, http
netlocNetwork Location (Domain/Host name)www.example.com, api.server:8080
pathFile path under the domain/articles/search, /index.html
queryQuery string (Parameters)q=python&page=1

Implementation Example: Parsing a Search URL

In this example, we will decompose the components of a URL from a fictional real estate search site and display them individually.

Source Code

from urllib import parse

# URL to parse (Fictional property search URL)
# Structure: Protocol://Domain/Path?QueryParameters
property_search_url = "https://realestate.example.com/rent/tokyo/search?min_price=50000&max_price=80000&layout=1K"

# 1. Parse the URL
# urlparse(url_string) returns a ParseResult object
parsed_data = parse.urlparse(property_search_url)

# 2. Check the entire parsed result
print(f"Parsed Object: {parsed_data}")
print("-" * 40)

# 3. Access each attribute to retrieve values
print(f"Protocol (scheme) : {parsed_data.scheme}")
print(f"Domain (netloc)   : {parsed_data.netloc}")
print(f"Path (path)       : {parsed_data.path}")
print(f"Query (query)     : {parsed_data.query}")

# Bonus: Converting query parameters into a dictionary
# parse_qs converts the query string into a format like {'min_price': ['50000'], ...}
query_dict = parse.parse_qs(parsed_data.query)
print("-" * 40)
print(f"Query Dictionary  : {query_dict}")

Execution Result

Parsed Object: ParseResult(scheme='https', netloc='realestate.example.com', path='/rent/tokyo/search', params='', query='min_price=50000&max_price=80000&layout=1K', fragment='')
----------------------------------------
Protocol (scheme) : https
Domain (netloc)   : realestate.example.com
Path (path)       : /rent/tokyo/search
Query (query)     : min_price=50000&max_price=80000&layout=1K
----------------------------------------
Query Dictionary  : {'min_price': ['50000'], 'max_price': ['80000'], 'layout': ['1K']}

Explanation

urlparse()

This is a function for structurally decomposing a URL. The return value is an instance of the ParseResult class, which behaves like a tuple. You can access values using attribute names (like .scheme) or indices (like [0]).

Use Cases

It is frequently used for tasks such as checking the domain of a redirect destination, extracting a filename from an image URL, or modifying parameters for an API request.

Important Note

If you pass an incomplete URL string (e.g., example.com/foo without https://), it may not be parsed correctly. In such cases, the scheme might be empty, or the entire string might be interpreted as the path.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次