Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: [News] Linux Reciprocity is a Major Merit

[H]omer <spam@xxxxxxx> espoused:
> Verily I say unto thee, that Mark Kent spake thusly:
> 
>> Has anyone tried to do anything like this already and perhaps has
>> solutions for these issues?
> 
> How about running this on a leafnode spool?:
> 
> ######
> #!/usr/bin/perl -w
> # parse-urls.pl
> 
> use strict;
> use URI::Find;
> 
> my $finder = URI::Find->new(
>   sub {
>     my($uri, $orig_uri) = @_;
>     return $orig_uri;
>   });
> 
> while (<>) {
>   my $text = $_;
>   $finder->find(\$text);
>   exec "lynx -source $text" or die;
> }
> 
> 1;
> ######
> 
>  - http://search.cpan.org/dist/URI-Find/
> 
> I'll play around with this, and see about adding URI verification.
> 
> Also IMHO the final output should be something like:
> 
> Article Name: <html title>
> Archive Date: <date fetched>
> Article URI : <orig_uri>
> Article Body: <output from parse-urls.pl>
> 
> Getting the *real* posting date for an upstream article is a more
> difficult proposition, since that info is not always available.
> 
> Also, for a proper citation, the upstream article *author* should be
> included, where possible.
> 

Homer - it's a great start, but part of my issue was about dealing with
the web-page end, where many articles are broken up over multiple pages.
Still, let me know how you get along.

-- 
| Mark Kent   --   mark at ellandroad dot demon dot co dot uk          |
| Cola faq:  http://www.faqs.org/faqs/linux/advocacy/faq-and-primer/   |
| Cola trolls:  http://colatrolls.blogspot.com/                        |

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index