Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: [News] Linux Reciprocity is a Major Merit

Verily I say unto thee, that Mark Kent spake thusly:

> Has anyone tried to do anything like this already and perhaps has
> solutions for these issues?

How about running this on a leafnode spool?:

######
#!/usr/bin/perl -w
# parse-urls.pl

use strict;
use URI::Find;

my $finder = URI::Find->new(
  sub {
    my($uri, $orig_uri) = @_;
    return $orig_uri;
  });

while (<>) {
  my $text = $_;
  $finder->find(\$text);
  exec "lynx -source $text" or die;
}

1;
######

 - http://search.cpan.org/dist/URI-Find/

I'll play around with this, and see about adding URI verification.

Also IMHO the final output should be something like:

Article Name: <html title>
Archive Date: <date fetched>
Article URI : <orig_uri>
Article Body: <output from parse-urls.pl>

Getting the *real* posting date for an upstream article is a more
difficult proposition, since that info is not always available.

Also, for a proper citation, the upstream article *author* should be
included, where possible.

-- 
K.
http://slated.org - Slated, Rated & Blogged

.----
| "Future archaeologists will be able to identify a 'Vista Upgrade
| Layer' when they go through our landfill sites" - Sian Berry, the
| Green Party.
`----

Fedora Core release 5 (Bordeaux) on sky, running kernel 2.6.19-1.2288.fc5
 16:32:02 up 25 days,  3:57,  3 users,  load average: 0.50, 0.73, 0.74

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index