![]()
From: Zvi Har'El (rl@math.technion.ac.il)
Date: Mon Apr 07 2003 - 11:39:27 CDT
Dear Jose,
On Mon, 07 Apr 2003 18:09:47 +0200, Jose Kahan wrote about "Re: [hypermail] Latin1 subject with UTF-8 body":
> I did something similar in my XHTML hypermail work for converting
> the winlatin1 characters that are inserted inside messages coded with
> ISO-8859-1. That is, I'm converting them into the equivalent Unicode
> entities. I guess that in your case, the rule would be "if the
> message's charset is UTF-8, then convert the ISO-8859-1 set into
> the equivalent Unicode one.
This conversion has nothing to do with what the message charset is. My
suggestion is to use character entities, which are ascii representation of the
unicode characters, and they are usable for any message charset (assuming of
course that ascii is a subset of the document charset, which is the basic html
assumption). So such a conversion should be done, I believe, for any non-ascii
part of the subject, i.e, which is expressed using the rule
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
of RFC 2047, where 'encoding' is "Q" (quoted printable)or "B" (base64).
It is true that after the conversion to unicode is done, the printable ascii
range can still be represented as is, without the recourse to character
entities, for optiomization of the representation, in case the MUA (like in my
case) use the encoding even in cases they are not really needed (where the few
words of the subject where encoded, although it was necessary to encode just
two letters in one of the words).
>
> You can see how I did it and then expand on that work.
Thanks. I'll look into it.
Best,
Zvi.
--
Dr. Zvi Har'El mailto:rl@math.technion.ac.il Department of Mathematics
tel:+972-54-227607 icq:179294841 Technion - Israel Institute of Technology
fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
Monday, 6 Nisan 5763, 7 April 2003, 7:11PM
![]()