Just like “List Comprehensions” for creating lists, Python has “Set Comprehensions” for creating sets (set) concisely. While list comprehensions use [] (square brackets), set comprehensions use {} (curly braces).
The biggest feature of set comprehension is that duplicate elements are automatically removed during the process. This allows you to write “data processing” and “deduplication (removing duplicates)” simultaneously in a single line.
This article explains the basic syntax of set comprehensions and how to filter data using conditions.
Basic Syntax of Set Comprehension
Set comprehension is written by placing a for loop inside {}.
Syntax:
{expression for variable in iterable}
Specific Example: String Normalization
Consider a case where you process a list of tags (keywords) submitted from an input form. If users input inconsistent casing (e.g., “Python”, “python”), you might want to convert them all to lowercase and create a set with duplicates removed.
# List of input tags (contains duplicates and inconsistent casing)
raw_tags = ["Python", "django", "python", "API", "Django", "WEB"]
# Using Set Comprehension
# 1. Convert to lowercase with tag.lower()
# 2. Since it is a set, duplicates are automatically removed
unique_tags = {tag.lower() for tag in raw_tags}
print(f"Original List: {raw_tags}")
print(f"Processed Set: {unique_tags}")
Output:
Original List: ['Python', 'django', 'python', 'API', 'Django', 'WEB']
Processed Set: {'python', 'api', 'django', 'web'}
If you used list comprehension [...], duplicates like ['python', 'django', 'python', ...] would remain. However, by using set comprehension {...}, you get a collection of unique elements only.
Conditional Set Comprehension
Just like with lists, you can add an if statement to extract only elements that meet a specific condition.
Syntax:
{expression for variable in iterable if condition}
Specific Example: Extracting Valid Data
Here is an example of taking a list of sensor data, removing “error values (negative numbers),” and creating a set of only normal data.
# Data acquired from sensor (contains duplicates and noise)
sensor_data = [15.5, 16.2, -1.0, 15.5, 14.8, -99.9, 16.2]
# Extract only data >= 0 and remove duplicates
valid_data_set = {data for data in sensor_data if data >= 0}
print(f"Original Data: {sensor_data}")
print(f"Valid Data Set: {valid_data_set}")
Output:
Original Data: [15.5, 16.2, -1.0, 15.5, 14.8, -99.9, 16.2]
Valid Data Set: {16.2, 15.5, 14.8}
Summary
- Set comprehension is written in the format
{expression for variable in iterable}. - You can use it simply by changing the
[]of list comprehension to{}. - Since the generated object is a
set, duplicate elements are automatically removed. - The order of elements is not preserved.
- It is very efficient when you want to perform “data conversion” and “duplicate removal” at the same time.
