When you want to remove duplicate data from a list, using the set type set() is common. However, set() does not maintain the order of elements.
If you want to remove duplicates while keeping the original order of data appearance, using the dict.fromkeys() method (which utilizes dictionary characteristics) is the best approach.
This article explains how to create a unique list while preserving order, using search history data as an example.
The Problem to Solve
In cases like “recently viewed items” in a web browser or app, the order (newest or oldest first) is meaningful. Therefore, simply removing duplicates and randomizing the order causes issues.
Implementation Example: Deduplicating Search History
In this example, we remove duplicates from a user’s search keyword history list to create a unique list. During this process, the original search order is maintained.
Source Code
# User search keyword history (chronological order)
# "python" and "tutorial" appear as duplicates
search_history = [
"python",
"tutorial",
"django",
"python", # Duplicate
"machine-learning",
"tutorial", # Duplicate
"web-design"
]
print(f"Original List: {search_history}")
# Use dict.fromkeys() to remove duplicates while preserving order
# 1. Create a new dictionary using elements as keys (keys must be unique)
# 2. Extract only keys using list() to convert back to a list
unique_history = list(dict.fromkeys(search_history))
print(f"Processed List: {unique_history}")
Execution Result
Original List: ['python', 'tutorial', 'django', 'python', 'machine-learning', 'tutorial', 'web-design']
Processed List: ['python', 'tutorial', 'django', 'machine-learning', 'web-design']
Explanation
Why use dict.fromkeys()?
- Elimination of Duplicates: Since dictionary keys must be unique, duplicates are automatically removed when converting the list into dictionary keys.
- Preservation of Order: In standard Python (3.7 and later), dictionaries preserve insertion order. This means keys are registered in the order they appear in the list, keeping the original sequence intact.
Difference from set()
For reference, here is the behavior when using set().
# Using set (order is not guaranteed)
print(list(set(search_history)))
# Output Example: ['web-design', 'django', 'python', 'machine-learning', 'tutorial']
# -> The order may be shuffled
Use set() when speed is the priority and order doesn’t matter, and use dict.fromkeys() when the order is important.
