【Python】Getting HTML Elements with Selenium in Python: ID, Class, XPath, and CSS Selector

目次

Overview

This article explains how to use the find_element method and the By class, which are standard in Selenium 4.x and later, to get specific HTML elements (such as buttons, input fields, and headings) on a web page. We cover various methods including ID, class name, and tag name, as well as flexible methods like CSS selectors and XPath.

Specifications (Input/Output)

  • Input:
    • Search criteria (By.ID, By.XPATH, etc.)
    • Search value (ID name, XPath string, etc.)
  • Output:
    • WebElement object (if found)
    • NoSuchElementException (if not found)
  • Requirement: You must import from selenium.webdriver.common.by import By.

Basic Usage

This is the basic pattern to get a single element using the ID attribute, which is the fastest and most reliable method.

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com") # Test page

# Get the element with id="my-button"
element = driver.find_element(By.ID, "my-button")

# Display the text
print(element.text)

driver.quit()

Full Code

This is a complete implementation example that uses different locators (search methods) to get elements and display their information.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import os

def extract_elements_demo():
    """
    Demo function to get HTML elements using ID, Class, Tag, CSS Selector, and XPath.
    """
    driver = webdriver.Chrome()
    
    # Target URL for demonstration
    target_url = "https://example.com" 
    
    try:
        driver.get(target_url)

        print(f"Page Title: {driver.title}\n")

        # 1. Get by ID (By.ID)
        # Best when there is a unique ID in the page
        try:
            el_id = driver.find_element(By.ID, "header-id")
            print(f"[ID] Text: {el_id.text}")
        except NoSuchElementException:
            print("[ID] Element with the specified ID was not found.")

        # 2. Get by Class Name (By.CLASS_NAME)
        # If multiple elements have the same class, it returns the first one found
        try:
            el_class = driver.find_element(By.CLASS_NAME, "content-class")
            print(f"[Class] Text: {el_class.text}")
        except NoSuchElementException:
            print("[Class] Element with the specified Class was not found.")

        # 3. Get by Tag Name (By.TAG_NAME)
        # Search by tag names like <h1>, <p>, or <div>
        el_tag = driver.find_element(By.TAG_NAME, "h1")
        print(f"[Tag] H1 Text: {el_tag.text}")

        # 4. Get by CSS Selector (By.CSS_SELECTOR)
        # Flexible selection by combining IDs, Classes, and attributes
        # Example: a p tag inside a div tag
        el_css = driver.find_element(By.CSS_SELECTOR, "div > p")
        print(f"[CSS] Text: {el_css.text}")

        # 5. Get by XPath (By.XPATH)
        # Can search with complex conditions like hierarchical structures or text content
        # Example: absolute path like html/body/div/h1
        el_xpath = driver.find_element(By.XPATH, "/html/body/div/h1")
        print(f"[XPath] Text: {el_xpath.text}")

    except Exception as e:
        print(f"Unexpected error: {e}")

    finally:
        driver.quit()

if __name__ == "__main__":
    extract_elements_demo()

Customization Points

List of By Class Attributes

These are the main attributes of the By class used as the first argument in find_element.

Attribute (By.XXX)MeaningHTML / Value ExampleCharacteristics
By.IDSearch by id attribute<div id="login">
By.ID, "login"
Fastest and recommended. Easy to identify as it should be unique.
By.CLASS_NAMESearch by class attribute<div class="btn">
By.CLASS_NAME, "btn"
Using only one part of a compound class (like “btn primary”) might cause errors.
By.TAG_NAMESearch by tag name<h1>Title</h1>
By.TAG_NAME, "h1"
Good for getting a list (find_elements) when many elements match.
By.NAMESearch by name attribute<input name="user">
By.NAME, "user"
Often used to identify form elements (input, select, etc.).
By.CSS_SELECTORSearch by CSS selector<div id="a"><p>...
By.CSS_SELECTOR, "#a > p"
Allows stylish and flexible selection combining IDs and Classes.
By.XPATHSearch by XPath<div><span>Text</span>
By.XPATH, "//div/span"
The most powerful method; can search by structure or text content.

Important Notes

If the element is not found

If no element matches the condition, a NoSuchElementException error occurs and the program stops. To check for existence, wrap the code in a try-except block or use find_elements (plural) as described below.

Difference between singular and plural

  • find_element (singular): Returns the first element found. Errors if not found.
  • find_elements (plural): Returns a list of all elements found. Returns an empty list [] if none are found, so it does not cause an error.

Compound Class Trap

For classes separated by spaces like <div class="btn btn-primary">, writing "btn btn-primary" in By.CLASS_NAME may cause an error. In this case, use a CSS selector: By.CSS_SELECTOR, ".btn.btn-primary".

Advanced Usage

This example shows how to get an “attribute value” (like a link URL) rather than the element itself.

from selenium import webdriver
from selenium.webdriver.common.by import By

def get_attribute_value():
    driver = webdriver.Chrome()
    driver.get("https://www.google.com")

    # Locate the search input using the name attribute
    try:
        search_input = driver.find_element(By.NAME, "q")
        
        # To get the value entered in an input tag, look at the 'value' attribute
        input_value = search_input.get_attribute("value")
        print(f"Current input value: {input_value}")
        
        # Example of getting a URL (href) from a link (a tag)
        # link = driver.find_element(By.TAG_NAME, "a")
        # print(link.get_attribute("href"))

    finally:
        driver.quit()

if __name__ == "__main__":
    get_attribute_value()

Summary

The basic syntax for getting elements in Selenium is driver.find_element(By.xxx, "value"). You should prioritize By.ID because it is unique. If no ID is available, use By.CSS_SELECTOR or By.XPATH to narrow down your target.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次