Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: saving search results with bash script, curl, wget...

__/ [ Nospam ] on Friday 17 March 2006 17:23 \__

> 
> "Roy Schestowitz" <newsgroups@xxxxxxxxxxxxxxx> wrote in message
> news:dvem7a$gja$1@xxxxxxxxxxxxxxxxxxxx
>> __/ [ Nospam ] on Friday 17 March 2006 13:29 \__
>>
>> > "Roy Schestowitz" <newsgroups@xxxxxxxxxxxxxxx> wrote in message
>> > news:dve54r$c40$1@xxxxxxxxxxxxxxxxxxxx
>> >> __/ [ canadafred ] on Friday 17 March 2006 02:31 \__
>> >>
>> >> >> I am wondering how it is possible to place the results from a search
>> > query
>> >> >> url with wget, curl, or bash scripts (or any other command line
> tool)
>> > and
>> >> >> get the results(including the results of the next page) into a txt
> file
>> >> >> with
>> >> >> each a newline for each url of the result?
>> >> >> i.e for a search query; something like www.example.com=query?...
>> >> >> each result can be placed on a newline in a text file.
>> >> >
>> >> > cross posting removed
>> >>
>> >> Thanks, Fred.
>> >>
>> >> At risk of answering an Internet troll:
>> >>
>> >> Use wget, e.g.:
>> >>
>> >> ,----[ Commands ]
>> >> | cd ~; wget http://www.google.co.uk/search?&ie=utf-8&oe=utf-8&q=nospam
>> >> `----
>> >>
>> >> Then, you could use Perl (script from Brian Wakem):
>> >>
>> >> ,----[ get_urls.sh ]
>> >> | cat ~/index.html| perl -ne '@url=m!(http://[^>"]+)!g;print "$_\n"
>> >> | foreach @url' > ~/googleurls
>> >> `----
>> >
>> > can you elaborate on the above script, I tried to run it with bash but
> it
>> > says cat command not found, I am on windows using bash from cygwin.
>>
>>
>> Cat (concatenate) is a very fundamental and simple command. cygwin is not
>> complete, so I advise you to install and run Linux as a virtual machine
>> under Windows (VMWare is now free) or get a dual-boot setting.
>>
>> I am sorry if this answer is unhelpful, but if you don't have CAT(1), then
> I
>> doubt you will have Perl, among other valuable utilities.
>>
>> Hope it helps,
>>
>> Roy
> Finally got the program to work, however it only deals with the first page
> of results and doesn't follow on to the next page of results

Try *this*:

http://www.google.com/ie?q=nospam&num=100

You could also write a script to iteratively equest all pages one by one, in
turn.

Hope it helps,

Roy

-- 
Roy S. Schestowitz
http://Schestowitz.com  |    SuSE Linux     ¦     PGP-Key: 0x74572E8E
  5:45pm  up 9 days 10:22,  7 users,  load average: 0.44, 0.47, 0.52
      http://iuron.com - help build a non-profit search engine

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index