Re: ultimate searchable archives

---------

From: Bill Moseley (moseley@hank.org)
Date: Mon Sep 16 2002 - 23:13:21 CDT


At 01:24 PM 09/06/02 -0700, Bill Paxton wrote:
>Are there some pre-done htdig modifications out there?
>I checked contrib but nothing I could find. Is there
>something better than htdig?

I'm one of the developers of swish-e (http://swish-e.org). I've used it
for indexing hypermail archives -- there's a perl script in the swish-e
distribution that I have used for parsing the metadata form the hypermail
HTML messages.

The downfall is that swish doesn't do incremental indexing, so for a very
high volume list it might be a problem. On the other hand, swish is so
damn fast[1] at indexing that for most application you don't need
incremental indexing. If your messages are not coming in every second or
so then you can typically figure out a way to build an index quickly (i.e.
have master index created once a day and run indexing on just new messages
for the day every minute or so and search both indexes at the same time).

The swish-e list is a hypermail archive and it's searchable at

   http://swish-e.org/Discussion/search/swish.cgi

You could probably come up with a better looking interface.

[1] Fast is subjective, of course. On my athlon I can index 100,000 2K
text files in about three minutes. YMMV, of course.

-- 
Bill Moseley
mailto:moseley@hank.org

---------

This archive was generated by hypermail 2.1.5.