Re: Strip all but URL's from Web Pages [was: ping John, OT]

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: Strip all but URL's from Web Pages [was: ping John, OT]

Subject: Re: Strip all but URL's from Web Pages [was: ping John, OT]
From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
Date: Wed, 21 Dec 2005 02:46:03 +0000
Newsgroups: alt.internet.search-engines
Organization: schestowitz.com / MCC / Manchester University
References: <op.s11wsdj326l578@borek> <do7lkd$8tl$1@godfrey.mcc.ac.uk> <1135050633.880492.85250@z14g2000cwz.googlegroups.com> <do806g$bd8$5@godfrey.mcc.ac.uk> <op.s12o3opg26l578@borek>
Reply-to: newsgroups@xxxxxxxxxxxxxxx
User-agent: KNode/0.7.2

__/ [Borek] on Tuesday 20 December 2005 08:12 \__

> On Tue, 20 Dec 2005 05:09:51 +0100, Roy Schestowitz
> <newsgroups@xxxxxxxxxxxxxxx> wrote:
> 
>>> You know, it's finals week at my university, but give me a week or so
>>> and I can write one for you.  You're talking about wgetting a page like
>>> http://www.google.com/search?hl=en&lr=&q=i+love+seo right?
>>
>> I can't speak for Borek, but I would love to have something generic.
>> Unwanted
>> links can be stripped off the list manually, or even filtered out based
>> on
>> some criterion, e.g.
>>
>> fgrep 'google.' list_of_newline-separated_links.txt >google_links.txt
> 
> That will be ideal.
> 
> The truth is - in such cases I am usually writing some C/C++ code
> to handle such things. As Linux/bash are not my native environement
> (I grow up in DOS) I later often learn that what I did in C can be
> easily done using some fancy combination of awk/grep/sort and so on.
> 
> So as a last resort C is ready but perl will be more geeky ;)
> 
> Best,
> Borek

I am not entirely sure what you are trying to achieve. However, if only
Google SERP is what you handle, why not parse feeds or use them directly?

I assume you are trying to automate some type of analysis:

http://www.benhammersley.com/projects/google_to_rss.html

There are equivalents for Yahoo, MSN and many, many others.

Hope it's helpful,

Roy

-- 
Roy S. Schestowitz      | "Stand for nothing and you will fall for anything"
http://Schestowitz.com  |    SuSE Linux     |     PGP-Key: 0x74572E8E
  2:40am  up 10 days  9:51,  6 users,  load average: 0.00, 0.03, 0.04
      http://iuron.com - next generation of search paradigms

References:
- Re: Strip all but URL's from Web Pages [was: ping John, OT]
  - From: Roy Schestowitz
- Re: Strip all but URL's from Web Pages [was: ping John, OT]
  - From: Roy Schestowitz

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index