February 23, 2006

charsets suXors Curious George: I have posted far and wide - yet have found no solution. I have a php and charset encoding dilemma. So as not to cause mass hysteria - details listed inside....

The scene: Wordpress Blog 2.0.1 Postie plugin 0.9.9.2 Blog is encoded utf-8. Blog entries in Japanese post and display no problem. Also they show up correctly in database. Utf-8 encoded emails in Japanese show up on the blog fine. Mails posted with English text via a Japanese mobile phone - no problem. Mails posted via Japanese mobile phone - big problem. Japanese mobile phone mails are encoded ISO-2022-JP (this can't be changed). So I've tried to track down where the stuff up is occurring: mail sends from mobile phone - OK mail goes to mail server - OK Postie gets mail from mail server, and displays on test page - OK Postie sends mail to database - NOT ok (so obviously the steps after that are all buggered up too) So I think it's an issue with the way Postie handles the encoding of the mail. Solutions tried (and failed): - editing the convert function in the postie-function.php - trying to insert a mb_convert function in the file - swearing profusely at it If you have some advice or this sounds like a fascinating puzzle the gomi-modded Postie files are here. Unfortunately I am unable to throw wads of cash or nubile young Japanese school girls at anyone who can help... but I do have an intriguing collection of unusual Japanese snacks....

  • Oh I forgot to mention that I have tried alternative solutions such as using the wp-mail.php thingie but that resulted in the same mess.
  • Oh I should also add that knowing about the Japanese aspect isn't essential - just the encoding thingie.
  • You just made my eyeballs explode. Hope you're happy!
  • So the output from DisplayEmailPost is exactly as you'd expect. Does that include the underlying html for the test page? Is the return from wp_insert_post(); valid? Does anything end up in the DB? My single thought is that if it is displaying properly, then the content of the elements in $details are not compatible with the DB. IE 1 - 2 bytes rather than up-to 4. I haven't looked at source for a while but I'd be interested to know whats going on with this. I'll have a poke around in the morning.
  • hey Lara - this post did have a warning (^^) randomaction I tested again, seems that the test page changes the encoding to ISO-2022-JP. hmmmm. In the DB the message is posted - but it's all mangled characters. All other Japanese posts (such as those made via the WordPress interface, or sent with Japanese text and the mail encoded utf-8) show up fine in the DB, and fine on the page. Food for thought?
  • My only thought is that the database does not support the character set correctly. Searching for mySQL problems with japanese characters shows a number of bug reports. That might the place to look.
  • Hmmm cheers loto will check that too.
  • OK! Database was not the problem. It was the Postie code. Have Japanese mails posting correctly! But not the subject if there is an attachment on mail. Will post link to complete solution when that bit is worked out.
  • You could use mbstring to convert to the internal utf-8 encoding, I think that should fix it. That's if I've got any kind of handle on issue at all. I hope that this helps.
  • Cool! I didn't above, I hold posts in preview for hours sometimes.
  • See, I didn't see above.
  • and randomaction lives up to username (^^) *kiss*
  • I like your photos, gomichild.
  • oh cheers the quidnunc kid
  • I'm so bold sometimes.
  • Alrighty! If you ever need to do this (or are just a keen geek) here is THE SOLUTION [self-link]. *phew*
  • Well done you clever monkey.
  • (^^) really though it was due to other people being clever and me just being good at whining loudly....
  • Especially my new Coding Hero - zengargoyle.