Dealing with UTF-8 text using the PHP mail()-function

On one of my PHP-based websites, users can send mails via a simple text form. I had a problem with special characters in the subject or message body. The webpage has a <meta charset=”utf-8″ />-tag, which causes problems when people enter special characters, like é or € or “smart quotes”. They get sent via the mail()-function and show up like Ã³ or Ã© or â€™.

Deep down in the user comments in the documentation on the mail()-function, I found a small function that solved this easily:

function mail_utf8($to, $subject = '(No subject)', $message = '', $header = '') {
 $header_ = 'MIME-Version: 1.0' . "\r\n" . 'Content-type: text/plain; charset=UTF-8' . "\r\n";
 $result = mail($to, '=?UTF-8?B?'.base64_encode($subject).'?=', $message, $header_ . $header);
 return $result;
}

Just use mail_utf8() instead of mail() on a UTF-8 page and a proper UTF-8 mail will be sent out.

JavaScript uses url-encoded UTF-8 strings to perform Ajax POSTs

Today, I had an encoding issue with a website I was working on. I had a

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

tag in the <head>-section, to ensure the page would correctly display and interpret all characters, including accented ones, like à and é.

The page was an administration page, to update texts on a website. Everything worked fine when I used standard <form> and <input>-tags, but I wanted to use Ajax to save any changes – to avoid a full page reload when something simple as a title had to be modified – things started to go wrong.

I used the spectacular jQuery library to do that, using a $.post-statement. At first sight, this seemed to be working fine, until I used some accented characters. When I entered “Soirée Théâtre” as title, the characters “Soirée Théâtre” were stored in the database.

I first thought it was an issue similar to the Bulgarian character set issue in mySQL I encountered a few months ago. But I was just using French in this case, so iso-8859-1 and mySQL character set “latin1_swedish_ci” should suffice in this case.

After some googling, I found this website, that explained that in the case of Ajax POSTs, “JavaScript serializes all the fields and it always uses url-encoded UTF-8 strings for this”. A simple utf8_decode in the PHP program that received the Ajax post-statement solved my issue.