Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Time for deep crawl results to show up

On Tue, 22 Feb 2005 11:51:57 +0000, Roy Schestowitz
<newsgroups@schestowitz.com> wrote:

>>>What I had in mind when I wrote this were pages that are 3-5 levels deep.
>>>
>>>Roy
>> 
>> The usual way to get round this is to have a series of site maps on
>> the second level.
>
>Interesting... why not just have a single gigantic site map? Cascading being
>necessarily better?
>
>Roy

You don't need any of this as long as your site is correctly linked
together. Basically if you can start from your home page and find your
way to every page of your site through text or image links (no
javascript, applets or frames**) the bots will index your entire site.

Exceptions to this are very, very large sites where external links are
important to get enough bots to your site for it to be indexed
completely on a regular basis. Put another way if you have a 50,000
page site with only one link to the home page from an external source,
that link is the only route the bots have to get access to your site.
In this scenario it would be highly unlikely for Google etc... to
index the entire site in a reasonable period of time.

The same site with 1000 links to various pages has a thousand ways for
bots to find your site.

This is why large sites tend not to have every page indexed, there
isn't enough 'time' for the bots to find every page in time for the
next update. So new pages are added, old pages are removed on a
dynamic basis.

I rarely use site maps per se, for example the tens of thousands of
pages of my literature sites do not have site maps, closest are these
pages http://www.classic-literature.co.uk/authors.asp (list all main
author pages) and
http://www.classic-literature.co.uk/classic-literature.asp (lists
author pages and all of the index pages to each book). You'll note the
latter of the two is 100KB in size, if it gets any bigger part of the
content won't be indexed by Google.

This site has some very deep pages like
http://www.classic-literature.co.uk/british-authors/16th-century/francis-bacon/the-advancement-of-learning/ebook-page-114.asp
which is 6  or 7 levels deep in the main link structure. This page has
just two direct links to it (a PR1 and a PR2)-

http://www.classic-literature.co.uk/british-authors/16th-century/francis-bacon/the-advancement-of-learning/ebook-page-113.asp
http://www.classic-literature.co.uk/british-authors/16th-century/francis-bacon/the-advancement-of-learning/pages-1.html

Yet it and all the other pages of that book is fully indexed and
findable with specific searches in Google like-

quickeneth both these doctrines 
For the liturgy or service, it consisteth 
holy meditation, Christian resolution
honour, first of the Divine Majesty

I wouldn't expect much from a page this deep (no hard SERPs) with so
few links though.

** Frames are only a problem if the site isn't linked correctly, the
main frame page needs a noframe area with links to content pages or an
alternative navigation root to the content (like a sitemap :-)).

David

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index