Delete Duplicate Emails
Problem Description​
Write a SQL query to delete all the duplicate emails, keeping only one unique email with the smallest id
.
Assume the table Person
has the following structure:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Example​
Example 1:
Assume that the `Person` table has the following content:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Your query should delete the duplicate emails and the result table should look like:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
Constraints​
- The
Person
table will have at most 10^4 rows. - Each email in the
Person
table will have at most 100 characters.
Solution Approach​
Approach Overview​
To delete duplicate emails while keeping the record with the smallest id
, we can use a subquery to identify the duplicates and then delete them based on their id
.
Detailed Steps​
- Identify Duplicates:
- Use a subquery to find the smallest
id
for each duplicate email.
- Use a subquery to find the smallest
- Delete Duplicates:
- Delete from the
Person
table where theid
is not in the list of smallestid
s for each email.
- Delete from the
Code Examples​
SQL Query​
DELETE p1 FROM Person p1
JOIN Person p2
ON p1.email = p2.email
AND p1.id > p2.id;
Explanation​
- The
JOIN
operation matches each row in thePerson
table with every other row having the same email. - The
DELETE
statement removes rows from thePerson
table where theid
is greater than the smallestid
for each email, effectively keeping only one unique email with the smallestid
.
Complexity​
-
Time Complexity:
O(n^2)
in the worst case, wheren
is the number of rows in thePerson
table, due to theJOIN
operation. However, for practical purposes with indexing and optimizations, the performance may be better. -
Space Complexity:
O(1)
, as no additional space is required beyond the query execution.