Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Spidering Sites (was: Konqueror off-line)

__/ [ wbarwell ] on Saturday 11 March 2006 10:21 \__

> Roy Schestowitz wrote:
> 
>> __/ [ wbarwell ] on Saturday 11 March 2006 09:04 \__
>> 
>>>  news@xxxxxxxxxxxxxx wrote:
>>> 
>>>> Hi,
>>>> 
>>>> It seems to me that Konqueror can't save the fetched page.
>>>> I need to fetch & save several pages, to be read off-line later,
>>>> and to extract URLs which I will fetch next time I go on-line.
>>>> 
>>>> Is it possible that Konqueror can't do theis ?!
>>>> 
>>>   
>>> It should do it, but I have found that Konquerer
>>> as a browser is slow and a bit buggy. For reading
>>> and saving stuff Opera works very well, and its a
>>> lot better than Konquerer for stuff like printing too.
>>> I'd say google up the Opera website and download Opera
>>> and use that.  I use KDE and the file manager/konquerer
>>> works quite well with Opera as a system.
>>> 
>>> Get Opera, you won't be sorry.
>> 
>>  
>> Using Konqueror in isolation is hard. Many zealous sites will not even let
>> it in. For that reason, I keep Firefox and Opera installed. I do not
>> believe that Konqueror has the functionality you described, but you can
>> browser the Konqueror cache, which probably resides in
>> ~/.kde/share/apps/konqueror/ although I can't seem to find it.
>> 
>> Best wishes,
>> 
>> Roy
>> 
>   
> If there is something I want to read offline, I just save the pages I want
> with Opera.
> 
> I keep meaning to explore spiders some day but never seem to get around
> to it.
> 
> Your konquerer cache will be in the .kde cache-your-systems-name folder
 
@wbarewell,

Don't be intimidated by spiderting tools. They are /very/ easy to use.

If you want to download example.org, create a directory 'example'

$ mkdir example
$ cd example

Now simply

$ wget -R example.org

Since you probably don't want to download the entire site, have a look at the
options you have:

$ man wget

So, for example, consider:

$ wget -r -l2 -t1 -N -np -erobots=off
http://username:password@xxxxxxxxxxxxxxx:80

The above will limit the level of /depth/ explored in the site. It also
honours rules for spidering and it authenticates with the site, in case it
is necessary.

Best wishes,

Roy

-- 
Roy S. Schestowitz      |    #00ff00 Day - Bakset Case
http://Schestowitz.com  |    SuSE Linux     |     PGP-Key: 0x74572E8E
 10:20am  up 3 days  2:57,  7 users,  load average: 0.73, 0.59, 0.54
      http://iuron.com - Open Source knowledge engine project


[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index