Deleting similar records in MySQL

If you have a table in MySQL with duplicate records, you can easily get a list without the duplicates by using the DISTINCT keyword in the SELECT-statement:

SELECT DISTINCT * FROM table

If you want to delete the duplicate records, you can create a new table without the duplicates, by combining the CREATE TABLE statement with SELECT DISTINCT:

CREATE TABLE newtable SELECT DISTINCT * FROM table

However, if you have similar records, where some, but not all fields are the same, the DISTINCT approach does not work. This is an excerpt of a table with songs:

[TABLE=2]

I’d like to delete the similar records, keeping just one entry for each artist/title combination. The SELECT DISTINCT statement will not work, as none of the records are exact duplicates. In other words,

SELECT DISTINCT * FROM table

will return the same table.

I could do a

SELECT DISTINCT title,artist FROM table

which results in

[TABLE=3]

but I want to keep the information in the other columns.

The solution in MySQL is to use the multiple-table syntax for the DELETE statement using the same table twice:

DELETE T1
FROM table T1, table T2
WHERE T1.duplicateField = T2.duplicateField
AND M1.uniqueField > M2.uniqueField

For the example above, I could use

DELETE M1
FROM music M1, music M2
WHERE M1.title = M2.title AND M1.artist = M2.artist
AND M1.cd*100+M1.track > M2.cd*100+M2.track

which results in

[TABLE=4]

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.