![]()
From: Scott Rose (srose@direct.ca)
Date: Wed Mar 14 2001 - 01:26:31 CST
"Peter C. McCluskey" wrote:
> srose@direct.ca (Scott Rose) writes:
> >On an entirely different note, I have code that improves the performance
> >of hypermail, particularly in the case of large archives- hypermail
> >opens each file in an archive to build new indices each time a message
> >arrives when you run in message-at-a-time mode, and it's desperately
> >expensive. My approach uses a GDBM index so that a whole lot less I/O
> >has to take place. I've been waiting for 2.0 to ship before bringing
> >this up again... I mentioned it to Kent a year or so ago. Any interest?
> >It should be generalized beyond GDBM to be most widely useful...
>
> Could we see this code?
No, but I wanted you to know that I had it.
Just kidding! I have a version of a late 2.0 beta that has this stuff in place,
but not a version of 2.0. I could either point you to a source tarball of that,
or build it into 2.0 and point you to that, but it would happen a little later in
the latter case.
I also found that there was room for dramatic improvements in 1.02 that were
unrelated to I/O, which I fixed I *think* only for my own local version- but
I either found that the most egregious case was already repaired in 2.0 beta, or
failed to look hard enough. There was one function that was called N^2 times that
only needed to be called once. But I digress. I think that the I/O proportion is
a strong function of how the message store is accessed- if, God forbid, it's
NFS, there is room for a big win. Less so if it's on a local disk, which is where
the message store ought to be, we can agree.
Checking just now, I found my notes about my most recent tests of the performance
of my DBM hack. To do the test, I created a tool (called "hyperfeed") that would
pass one message at a time from an mbox to unique invocations of hypermail-
running hypermail once in mbox-at-a-time mode isn't a good test of the
performance for the case where, like I run all my archives, the messages are
archived as they arrive. I did this test with a 750-message mbox, on a local ext2
file system message store (Linux), back in October, 1999. I used GDBM as the dbm
package, which is regrettably all my code supports. When run with my -g switch to
enable the use of the DBM index, it took 78 seconds to complete. Without, 450
seconds. I think that qualifies as significant. I used the same hypermail binary
for both runs, the same file system, the same mbox, the same clock...
![]()