![]()
From: Tom von Alten (tom_vonalten@boi.hp.com)
Date: Fri May 07 1999 - 11:41:11 CDT
In the "Duplicate message ids" thread, Daniel Stenberg wrote:
> Yes (hypermail v1.02) did (try to thread messages with certain
> modified patterns of the subject). Although it did mess around
> too much in the original strings (i.e it wrote zero-bytes etc
> in them) for me to be able to keep them. When I cleaned up the
> thread and hash mess I never got around to add code in my
> newly written threadprint.c that checks for replies in other ways than
> In-Reply-To headers.
The threading of messages modified in predictable ways by MUAs was one of
the first things I undertook with our hypermail v1+ implementation,
although I did it via a wrapper script, rather than within hypermail.
We're still using the v1+ code and a wrapper script, but it has problems
with proper handling of multiple messages arriving close together. I'm
hoping to move to a v2.x and do away with the external bits soon, to take
advantage of the many improvements that have been made.
However, the threading we have is not something we want to lose, and there
are too many ways for the In-Reply-To approach not to work. (To name a
few: broken MUAs; replying to a message sent to multiple archives, or to
an archive and cc's; the sender choosing to start a fresh message, copy
the subject and quote as needed; recomposition of a "reply" by the sender.)
Our conceptual approach may be of interest. It's done with a shell script
and a variety of unix tools, so I don't think the particular code is of
interest.
The process is:
1. Remove any combination of defined subject prefixes, regardless of
case and nesting. We do "re:" and "betr.::", with any bracketed
number. "fw:" should be in there, too, but I decided early on to
skip that, and never went back and added it in. As Daniel pointed
out (and our inclusion of "Betr." shows), there's an element of
localization involved.
2. This leaves a string I called the "thread" (as opposed to "subject").
3. To speed processing, I saved all the thread strings in a file. The
new candidate is compared, independent of case, to see if there are
any matches. If one is found the subject is changed to "Re: $thread"
where "$thread" is the canonical version from the file.
4. If no match is found, and there were some prefixes stripped, change
the subject to the de-prefixed version (preserving whatever case
was used in the source message). This new thread string is
added to the thread file.
5. Pass the possibly modified message into hypermail, where it will
be threaded based on an exact match, or Re: + match.
Obviously, the approach within hypermail has to be different, but I think
it's already taken the trouble to read all the messages (headers?) from the
archive, so there wouldn't be a significant performance penalty.
Cheers,
_____________ Hewlett-Packard Computer Peripherals Bristol
Tom von Alten mailto:Tom_vonAlten@boi.hp.com
This posting is for informational purposes only.
It is not a statement of the Hewlett-Packard Co.
![]()