Deleting similar records in MySQL

If you have a table in MySQL with duplicate records, you can easily get a list without the duplicates by using the DISTINCT keyword in the SELECT-statement:

SELECT DISTINCT * FROM table

If you want to delete the duplicate records, you can create a new table without the duplicates, by combining the CREATE TABLE statement with SELECT DISTINCT:

CREATE TABLE newtable SELECT DISTINCT * FROM table

However, if you have similar records, where some, but not all fields are the same, the DISTINCT approach does not work. This is an excerpt of a table with songs:

[TABLE=2]

I’d like to delete the similar records, keeping just one entry for each artist/title combination. The SELECT DISTINCT statement will not work, as none of the records are exact duplicates. In other words,

SELECT DISTINCT * FROM table

will return the same table.

I could do a

SELECT DISTINCT title,artist FROM table

which results in

[TABLE=3]

but I want to keep the information in the other columns.

The solution in MySQL is to use the multiple-table syntax for the DELETE statement using the same table twice:

DELETE T1
FROM table T1, table T2
WHERE T1.duplicateField = T2.duplicateField
AND M1.uniqueField > M2.uniqueField

For the example above, I could use

DELETE M1
FROM music M1, music M2
WHERE M1.title = M2.title AND M1.artist = M2.artist
AND M1.cd*100+M1.track > M2.cd*100+M2.track

which results in

[TABLE=4]

UFT-8 on a website based on mySQL and PHP

I spent quite some time today trying to get the text on a website with Bulgarian characters to show properly. The text is coming out of a MYSQL 5.0 database, configured to use utf8_general_ci and the website is generated by PHP 5.2.5.
I knew I had to use
meta equiv=”Content-Type” content=”text/html; charset=utf-8″
to display the Bulgarian characters properly (like Велико Търново) and that worked fine for static text, but I got ???????? when using the text from the database.
I tried all sorts of things, and visited various websites with proposed solution, but finally I got it working by one simple setting, explained by this website. I needed to include
mysql_query(‘SET NAMES utf8’);
when connecting to the database. That solved the issue.