Delete Duplicate Emails
Problem Description​
Write a SQL query to delete all the duplicate emails, keeping only one unique email with the smallest id.
Assume the table Person has the following structure:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Example​
Example 1:
Assume that the `Person` table has the following content:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Your query should delete the duplicate emails and the result table should look like:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
Constraints​
- The
Persontable will have at most 10^4 rows. - Each email in the
Persontable will have at most 100 characters.
Solution Approach​
Approach Overview​
To delete duplicate emails while keeping the record with the smallest id, we can use a subquery to identify the duplicates and then delete them based on their id.
Detailed Steps​
- Identify Duplicates:
- Use a subquery to find the smallest
idfor each duplicate email.
- Use a subquery to find the smallest
- Delete Duplicates:
- Delete from the
Persontable where theidis not in the list of smallestids for each email.
- Delete from the
Code Examples​
SQL Query​
DELETE p1 FROM Person p1
JOIN Person p2
ON p1.email = p2.email
AND p1.id > p2.id;
Explanation​
- The
JOINoperation matches each row in thePersontable with every other row having the same email. - The
DELETEstatement removes rows from thePersontable where theidis greater than the smallestidfor each email, effectively keeping only one unique email with the smallestid.
Complexity​
-
Time Complexity:
O(n^2)in the worst case, wherenis the number of rows in thePersontable, due to theJOINoperation. However, for practical purposes with indexing and optimizations, the performance may be better. -
Space Complexity:
O(1), as no additional space is required beyond the query execution.