Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Strip all but URL's from Web Pages [was: ping John, OT]

Roy Schestowitz wrote:
> __/ [Borek] on Tuesday 20 December 2005 08:06 \__
>
>> On Tue, 20 Dec 2005 02:09:32 +0100, Roy Schestowitz
>> <newsgroups@xxxxxxxxxxxxxxx> wrote:
>>
>>> I'd be /very/ interested in an answer/solution to that too, John. I
>>> need to
>>> generate files that contain newline-separated URL's rather than
>>> copy and paste from Web pages. The closest I could ever get to
>>> minimal manual labour was:
>>>
>>> less search.html  | grep http://
>>
>> In google case it will not help - whole answer is one line.
>
> What complicates matters are syntaxes like:
>
> <A href="foo.bar"></A>
> <a title="linky thing" href="foo.bar"></A>
> <a  HREF='./foo.bar'></a>
>
> To make something that covers all cases, you couldn't just lazily
> scan all text while spewing out text that is contained between "<a
> href="" and """.

"lynx -dump $url" and use the references at the bottom ?

-- 
Mike 



[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index