Python Set Comprehensions: Processing Data and Removing Duplicates

Just like “List Comprehensions” for creating lists, Python has “Set Comprehensions” for creating sets (set) concisely. While list comprehensions use [] (square brackets), set comprehensions use {} (curly braces).

The biggest feature of set comprehension is that duplicate elements are automatically removed during the process. This allows you to write “data processing” and “deduplication (removing duplicates)” simultaneously in a single line.

This article explains the basic syntax of set comprehensions and how to filter data using conditions.


目次

Basic Syntax of Set Comprehension

Set comprehension is written by placing a for loop inside {}.

Syntax:

{expression for variable in iterable}

Specific Example: String Normalization

Consider a case where you process a list of tags (keywords) submitted from an input form. If users input inconsistent casing (e.g., “Python”, “python”), you might want to convert them all to lowercase and create a set with duplicates removed.

# List of input tags (contains duplicates and inconsistent casing)
raw_tags = ["Python", "django", "python", "API", "Django", "WEB"]

# Using Set Comprehension
# 1. Convert to lowercase with tag.lower()
# 2. Since it is a set, duplicates are automatically removed
unique_tags = {tag.lower() for tag in raw_tags}

print(f"Original List: {raw_tags}")
print(f"Processed Set: {unique_tags}")

Output:

Original List: ['Python', 'django', 'python', 'API', 'Django', 'WEB']
Processed Set: {'python', 'api', 'django', 'web'}

If you used list comprehension [...], duplicates like ['python', 'django', 'python', ...] would remain. However, by using set comprehension {...}, you get a collection of unique elements only.


Conditional Set Comprehension

Just like with lists, you can add an if statement to extract only elements that meet a specific condition.

Syntax:

{expression for variable in iterable if condition}

Specific Example: Extracting Valid Data

Here is an example of taking a list of sensor data, removing “error values (negative numbers),” and creating a set of only normal data.

# Data acquired from sensor (contains duplicates and noise)
sensor_data = [15.5, 16.2, -1.0, 15.5, 14.8, -99.9, 16.2]

# Extract only data >= 0 and remove duplicates
valid_data_set = {data for data in sensor_data if data >= 0}

print(f"Original Data: {sensor_data}")
print(f"Valid Data Set: {valid_data_set}")

Output:

Original Data: [15.5, 16.2, -1.0, 15.5, 14.8, -99.9, 16.2]
Valid Data Set: {16.2, 15.5, 14.8}

Summary

  • Set comprehension is written in the format {expression for variable in iterable}.
  • You can use it simply by changing the [] of list comprehension to {}.
  • Since the generated object is a set, duplicate elements are automatically removed.
  • The order of elements is not preserved.
  • It is very efficient when you want to perform “data conversion” and “duplicate removal” at the same time.
よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次