Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: What does site: report and what it really is? (was Re: Part 2 - Wondering why your site is not indexed in Google?)

__/ [ Big Bill ] on Saturday 17 June 2006 22:50 \__

> On 17 Jun 2006 19:35:34 GMT, John Bokma <john@xxxxxxxxxxxxxxx> wrote:
> 
>>Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:
>>
>>> __/ [ John Bokma ] on Saturday 17 June 2006 18:16 \__
>>
>>> Oops. This should read "indicated that my site had over 100,000
>>> pages". Missing 0 left place for misinterpretation.
>>
>>I had no idea where the 0 was missing,
>>
>>site:schestowitz.com          1 - 10 of about 709
>>
>>but:
>>
>>7,300 from www.schestowitz.com
>>
>>You might want to fix that. Question is: how many pages does it really
>>have?


This used to be uniform, i.e. with or without the "www" umbilical cord, I
would get the same number. Moreover, until 2-3 days ago, "site:"  was
showing about 6,700 pages. Yesterday it sank to 700 for the first time,
whether it means something or not... it's very unpredictable and it's
difficult to analyse (no good tools). All I know is that many pages are not
in the index and referrals volume (for text, not images) is down
significantly as a result. Pace of crawling is as good as ever, but unlike
Brian Waken's testimony, there is no improvement, i.e. nothing is being
added.


>>>> Question is: are there 4.something billion pages, or are there just a
>>>> few million.
>>> 
>>> 
>>> I tend to (or want to *wink*) believe that the space has been wasted
>>> on actually storing and indexing junk content.
>>
>>I tend to think that the site: operator needs too many resources at this
>>moment to operate correctly and hence it gives a wrong number.
>>
>>The question is: is it a factor, and does the factor grow?
>>
>>     67 from castleamber.com
>>Actual number: 80 (html, excluded CGI, some might be orphan).
>>
>>factor: 0.84


I see that in smaller sites of mine as well, but when a CMS is used, the
number goes beyond the point that I predict. Think, for example, about
Gallery. For each photo, there are various scales of zoom.


>>  9,640 from johnbokma.com (has some wrong URLs)
>>Actual number: 1117 (html, will add some more soon).
>>
>>
>>factor: 8.63
>>
>>
>>Question is: does this factor grow, and how?


It seems to go upwards. It only ever increased before the Big Daddy
awkwardness. This climb means that old cache (or broken URL's) might leave a
trail...? One assumption I had is that a CMS was accepting parameters (and
making them concrete/including them through links). Never found an answer
and didn't mind to much to mend it. "If it's ain't broken, why fix it" was
my -- shall we call it -- mantra/motto.


>>>> Good, no more MS bashing then?
>>> 
>>> No, I promise. I know it annoys you.
>>
>>It does because it's often based on lack of knowledge IMO. I did it ages
>>ago, until I discovered that a lot of the fans of the OS / computer I
>>was using were just lying and very biased. Things like: "our" OS can't
>>get a virus, because it's in ROM. The funny thing was, you could
>>overrule modules in ROM and extend them. And hence a virus could just do
>>the same. Anyway, when I had experience with several operating systems I
>>learned that each suck, and that each OS has it's own issues. Also it's
>>either a company, or a bunch of geek egos that make things harder then
>>they should be (or a combination).


I accept that. I'll leave advocacy to other, more relevant groups (400+
messages/week) and will try to abstain fully while I'm here.


>>>> same problem: there work x people, and they all are busy.
>>> 
>>> Maybe they should employ us to increase the value of /x/. We can
>>> develop sites for them to crawl and serve to people. And we can even
>>> work _for them_, sometimes. *smile*
>>
>>site: is not core business. So if we are going to get jobs at Google we
>>are probably going to work on GPay, or even GEvil.


*giggle*

GEvil (pronounced jivvel?) could become a tool where you enter a person's
name into a textarea, then wait for Google to scan the Internet for patterns
and determine is the person is evil. Given the hype over Trends, I can see
people using it. Maybe they can have a 1-to-10 scale for levels of evil.
This might work rather nicely assuming that names are unique. It's a big
like Copyscape with something extra on top, I suppose.


> I suspect I know what Roy wants to work on at Google... :-))


They have some nice massage tables. *smile*

Best wishes,

Roy

-- 
Roy S. Schestowitz
http://Schestowitz.com  |     GNU/Linux     ¦     PGP-Key: 0x74572E8E
  4:25am  up 51 days  9:39,  12 users,  load average: 1.45, 1.07, 1.00
      http://iuron.com - next generation of search paradigms

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index