[Python] Opening a Web Page with a Specified URL using Selenium

目次

Overview

This article explains how to use Selenium to navigate a browser to a specified URL. The get() method is a crucial function because it doesn’t just open the page; it also waits for the browser to finish loading the HTML (until the onload event is fired).

Specifications

  • Input: A string representing the URL of the website you want to access.
  • Output:
    • The browser displays the page.
    • Retrieval of the page title and the current URL.
  • Prerequisites: Selenium and WebDriver must be correctly configured.

Basic Usage

This basic code launches the Chrome browser, accesses a specified blog (morinokabu.com), and displays the title.

from selenium import webdriver
import time

driver = webdriver.Chrome()

# Navigate to the specified URL (waits for the page to finish loading)
driver.get("https://morinokabu.com")

# Display the page title to the standard output
print(f"Page Title: {driver.title}")

time.sleep(2) # Wait briefly for confirmation
driver.quit()

Full Code Example

This is a practical implementation that includes error handling and confirms the state change (URL transition) before and after access.

from selenium import webdriver
from selenium.common.exceptions import WebDriverException
import time

def open_website_demo():
    """
    Demo function to access a specific URL and retrieve information using Selenium.
    """
    driver = None
    target_url = "https://morinokabu.com"

    try:
        # Launch browser
        print("Launching the browser...")
        driver = webdriver.Chrome()

        # 1. Execute access
        print(f"Accessing: {target_url}")
        driver.get(target_url)

        # 2. Retrieve page information
        # Since the get() method blocks until the page load is complete,
        # the page is already displayed by the time it reaches here.
        page_title = driver.title
        current_url = driver.current_url

        print("-" * 30)
        print("Access Complete")
        print(f"Title : {page_title}")
        print(f"URL   : {current_url}")
        print("-" * 30)

        # Wait for 3 seconds for confirmation
        time.sleep(3)

    except WebDriverException as e:
        # Error handling for invalid URLs or lack of internet connection
        print(f"An access error occurred: {e}")
    
    except Exception as e:
        print(f"Unexpected error: {e}")

    finally:
        # Close the browser
        if driver:
            print("Closing the browser.")
            driver.quit()

if __name__ == "__main__":
    open_website_demo()

Customization Points

Page Navigation Methods

In addition to get(), there are other methods that utilize the browser’s navigation features:

  • driver.back(): Performs the same action as clicking the “Back” button.
  • driver.forward(): Performs the same action as clicking the “Forward” button.
  • driver.refresh(): Reloads the page (F5).

Specifying the URL

Always specify a complete URL starting with http:// or https://. Omitting this will result in an error.


Important Notes

  • Behavior of Loading Wait: driver.get() stops (blocks) processing until the page’s HTML structure (DOM) is loaded. However, for modern websites (like SPAs) where content is displayed later via JavaScript, you may need additional waiting processes using WebDriverWait.
  • Timeout: If a page loads extremely slowly, the script may hang for a long time. You can manage timeout periods by setting driver.set_page_load_timeout(seconds) if necessary.
  • Local Files: In addition to URLs on the web, you can open local HTML files by specifying a path like file:///C:/path/to/file.html.

Advanced Application

This code serves as the basis for a crawler that visits multiple URLs stored in a list.

from selenium import webdriver
import time

def crawl_pages():
    urls = [
        "https://morinokabu.com",
        "https://www.python.org",
        "https://www.google.com"
    ]
    
    driver = webdriver.Chrome()
    
    try:
        for url in urls:
            print(f"Navigating to: {url}")
            driver.get(url)
            print(f" - Title: {driver.title}")
            time.sleep(1) # Wait to reduce server load
            
    finally:
        driver.quit()

if __name__ == "__main__":
    crawl_pages()

Conclusion

driver.get("URL") is the most fundamental operation in Selenium. Understanding that it “waits for the load to complete” allows for smoother subsequent element retrieval and operations. For dynamic pages, consider adding further wait logic after this method.

Would you like me to adjust the technical depth or add specific error-handling examples for the WebDriverWait mentioned in the notes?

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次