![]()
From: Peter C. McCluskey (pcm@rahul.net)
Date: Mon Mar 13 2000 - 16:10:14 CST
The sample files you sent me had some carriage returns which were confusing
hypetombox. While I didn't get the same symptoms as you did, I'm moderately
confident I've fixed the problems.
I've also added a command line option to set the To: line to something other
than "bogus".
Here is a patch, or you can grab the latest copy from:
http://www.rahul.net/pcm/hypetombox.pl
--- hypetombox.pl 2000/03/06 21:26:06 1.4
+++ hypetombox.pl 2000/03/13 21:41:04
@@ -14,13 +14,13 @@
#
# Usage:
#
-# hypetombox.pl [-d <directory>] [-m <output_filename>]
+# hypetombox.pl [-d <directory>] [-m <output_filename>] [-n <to_address>]
#
# $Header: /home/pcm/CVS/hypetombox/hypetombox.pl,v 1.4 2000/03/06 21:26:06 pcm Exp $
require 5.000;
use Getopt::Std;
-getopts('d:m:');
+getopts('d:m:n:');
# This is a list of the fields in the comment header of each message.
@@ -33,6 +33,8 @@
@msgs = sort glob($fpat);
+$to_address = $opt_n || 'bogus';
+
# Open the output file for write.
$mbox_name = $opt_m || 'mbox';
@@ -73,7 +75,7 @@
# same name.
if($state eq 'HeaderComments'
- && ($line =~ /^<!-- (\w+)="(.+)" -->$/)) {
+ && ($line =~ /^<!-- (\w+)="(.+)" -->(\s*)$/)) {
$key = $1;
$value = $2;
$value =~ s/&/&/g;
@@ -100,7 +102,7 @@
print MBOX "From $email $date\n";
print MBOX "Date: $sent\n";
print MBOX "Message-Id: <$id>\n";
- print MBOX "To: bogus\n";
+ print MBOX "To: $to_address\n";
print MBOX "From: $email ($name)\n";
print MBOX "Subject: $subject\n";
if ($inreplyto) {
@@ -116,16 +118,16 @@
# This is a body line.
- next if($line =~ /^<br>$/i);
- next if($line =~ m|^</EM><BR>$|i);
- $line =~ s/<br>$//; # lose the trailing <br>
- $line =~ s/<BR>$//; # lose the trailing <br>
- $line =~ s/<pre>$//; # lose the <pre>formatted tags
- $line =~ s/<PRE>$//; # lose the <PRE>formatted tags
- $line =~ s/<\/pre>$//; # lose the </pre>formatted tags
- $line =~ s/<\/PRE>$//; # lose the </PRE>formatted tags
- $line =~ s/<P>$//; # lose the paragraph tags
- $line =~ s/<p>$//; # lose the paragraph tags
+ next if($line =~ /^<br>\s*$/i);
+ next if($line =~ m|^</EM><BR>\s*$|i);
+ $line =~ s/<br>$\s*//; # lose the trailing <br>
+ $line =~ s/<BR>$\s*//; # lose the trailing <br>
+ $line =~ s/<pre>$\s*//; # lose the <pre>formatted tags
+ $line =~ s/<PRE>$\s*//; # lose the <PRE>formatted tags
+ $line =~ s/<\/pre>$\s*//; # lose the </pre>formatted tags
+ $line =~ s/<\/PRE>$\s*//; # lose the </PRE>formatted tags
+ $line =~ s/<P>$\s*//; # lose the paragraph tags
+ $line =~ s/<p>$\s*//; # lose the paragraph tags
$line =~ s%<a href=[^>]+>([^<]+)</a>%\1%g; # lose hyperlinks
$line =~ s%<A HREF=[^>]+>([^<]+)</A>%\1%g; # lose hyperlinks
$line =~ s/</</g; # reverse map special characters
jpl@vectorbd.com (Jim Lill) writes:
>On Mon, 6 Mar 2000, Peter C. McCluskey wrote:
>
>>
>> jpl@vectorbd.com (Jim Lill) writes:
>> >
>> >
>> >Hi..
>> >
>> >Is there a Y2K problem with that script or am I doing something wrong? I
>> >get BOGUS DATE stuff. I'm not a Perl guy!
>>
>> It is probably date formatting problem. It expects to find lines like
>> this in the html files:
>>
>> <!-- received="Thu Oct 20 15:26:27 1995" -->
>>
>> Can you look at the html source and send me a sample of the "received" lines
>> that you see?
>> I will try to improve it sometime this week to look at other fields if it
>> can't parse the "received" line.
-- ------------------------------------------------------------------------------ Peter McCluskey | The US Idea Futures Exchange: speculate on http://www.rahul.net/pcm | political,financial issues at http://www.usifex.com
![]()