Re: Strip all but URL's from Web Pages [was: ping John, OT]

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: Strip all but URL's from Web Pages [was: ping John, OT]

Subject: Re: Strip all but URL's from Web Pages [was: ping John, OT]
From: "Mike Redrobe" <mike@xxxxxxxxxxx>
Date: Tue, 20 Dec 2005 19:50:58 GMT
Newsgroups: alt.internet.search-engines
References: <op.s11wsdj326l578@borek> <do7lkd$8tl$1@godfrey.mcc.ac.uk> <op.s12otfgr26l578@borek> <do94ms$20de$1@godfrey.mcc.ac.uk>
Xref: news.mcc.ac.uk alt.internet.search-engines:73116

Roy Schestowitz wrote:
> __/ [Borek] on Tuesday 20 December 2005 08:06 \__
>
>> On Tue, 20 Dec 2005 02:09:32 +0100, Roy Schestowitz
>> <newsgroups@xxxxxxxxxxxxxxx> wrote:
>>
>>> I'd be /very/ interested in an answer/solution to that too, John. I
>>> need to
>>> generate files that contain newline-separated URL's rather than
>>> copy and paste from Web pages. The closest I could ever get to
>>> minimal manual labour was:
>>>
>>> less search.html  | grep http://
>>
>> In google case it will not help - whole answer is one line.
>
> What complicates matters are syntaxes like:
>
> <A href="foo.bar"></A>
> <a title="linky thing" href="foo.bar"></A>
> <a  HREF='./foo.bar'></a>
>
> To make something that covers all cases, you couldn't just lazily
> scan all text while spewing out text that is contained between "<a
> href="" and """.

"lynx -dump $url" and use the references at the bottom ?

-- 
Mike

References:
- Re: Strip all but URL's from Web Pages [was: ping John, OT]
  - From: Roy Schestowitz
- Re: Strip all but URL's from Web Pages [was: ping John, OT]
  - From: Roy Schestowitz

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index