Re: hypetombox

---------

From: Peter C. McCluskey (pcm@rahul.net)
Date: Mon Mar 13 2000 - 16:10:14 CST


 The sample files you sent me had some carriage returns which were confusing
hypetombox. While I didn't get the same symptoms as you did, I'm moderately
confident I've fixed the problems.
 I've also added a command line option to set the To: line to something other
than "bogus".
 Here is a patch, or you can grab the latest copy from:
http://www.rahul.net/pcm/hypetombox.pl

--- hypetombox.pl 2000/03/06 21:26:06 1.4
+++ hypetombox.pl 2000/03/13 21:41:04
@@ -14,13 +14,13 @@
 #
 # Usage:
 #
-# hypetombox.pl [-d <directory>] [-m <output_filename>]
+# hypetombox.pl [-d <directory>] [-m <output_filename>] [-n <to_address>]
 #
 # $Header: /home/pcm/CVS/hypetombox/hypetombox.pl,v 1.4 2000/03/06 21:26:06 pcm Exp $
 
 require 5.000;
 use Getopt::Std;
-getopts('d:m:');
+getopts('d:m:n:');
 
 # This is a list of the fields in the comment header of each message.
 
@@ -33,6 +33,8 @@
 
 @msgs = sort glob($fpat);
 
+$to_address = $opt_n || 'bogus';
+
 # Open the output file for write.
 
 $mbox_name = $opt_m || 'mbox';
@@ -73,7 +75,7 @@
         # same name.
 
         if($state eq 'HeaderComments'
- && ($line =~ /^<!-- (\w+)="(.+)" -->$/)) {
+ && ($line =~ /^<!-- (\w+)="(.+)" -->(\s*)$/)) {
             $key = $1;
             $value = $2;
             $value =~ s/&amp;/&/g;
@@ -100,7 +102,7 @@
                 print MBOX "From $email $date\n";
                 print MBOX "Date: $sent\n";
                 print MBOX "Message-Id: <$id>\n";
- print MBOX "To: bogus\n";
+ print MBOX "To: $to_address\n";
                 print MBOX "From: $email ($name)\n";
                 print MBOX "Subject: $subject\n";
                 if ($inreplyto) {
@@ -116,16 +118,16 @@
 
             # This is a body line.
 
- next if($line =~ /^<br>$/i);
- next if($line =~ m|^</EM><BR>$|i);
- $line =~ s/<br>$//; # lose the trailing <br>
- $line =~ s/<BR>$//; # lose the trailing <br>
- $line =~ s/<pre>$//; # lose the <pre>formatted tags
- $line =~ s/<PRE>$//; # lose the <PRE>formatted tags
- $line =~ s/<\/pre>$//; # lose the </pre>formatted tags
- $line =~ s/<\/PRE>$//; # lose the </PRE>formatted tags
- $line =~ s/<P>$//; # lose the paragraph tags
- $line =~ s/<p>$//; # lose the paragraph tags
+ next if($line =~ /^<br>\s*$/i);
+ next if($line =~ m|^</EM><BR>\s*$|i);
+ $line =~ s/<br>$\s*//; # lose the trailing <br>
+ $line =~ s/<BR>$\s*//; # lose the trailing <br>
+ $line =~ s/<pre>$\s*//; # lose the <pre>formatted tags
+ $line =~ s/<PRE>$\s*//; # lose the <PRE>formatted tags
+ $line =~ s/<\/pre>$\s*//; # lose the </pre>formatted tags
+ $line =~ s/<\/PRE>$\s*//; # lose the </PRE>formatted tags
+ $line =~ s/<P>$\s*//; # lose the paragraph tags
+ $line =~ s/<p>$\s*//; # lose the paragraph tags
             $line =~ s%<a href=[^>]+>([^<]+)</a>%\1%g; # lose hyperlinks
             $line =~ s%<A HREF=[^>]+>([^<]+)</A>%\1%g; # lose hyperlinks
             $line =~ s/&lt;/</g; # reverse map special characters

 jpl@vectorbd.com (Jim Lill) writes:
>On Mon, 6 Mar 2000, Peter C. McCluskey wrote:
>
>>
>> jpl@vectorbd.com (Jim Lill) writes:
>> >
>> >
>> >Hi..
>> >
>> >Is there a Y2K problem with that script or am I doing something wrong? I
>> >get BOGUS DATE stuff. I'm not a Perl guy!
>>
>> It is probably date formatting problem. It expects to find lines like
>> this in the html files:
>>
>> <!-- received="Thu Oct 20 15:26:27 1995" -->
>>
>> Can you look at the html source and send me a sample of the "received" lines
>> that you see?
>> I will try to improve it sometime this week to look at other fields if it
>> can't parse the "received" line.

-- 
------------------------------------------------------------------------------
Peter McCluskey          | The US Idea Futures Exchange: speculate on
http://www.rahul.net/pcm | political,financial issues at http://www.usifex.com

---------

This archive was generated by hypermail 2.1.5.