Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Google phrase corrections

  • Subject: Re: Google phrase corrections
  • From: Chris Hope <blackhole@electrictoolbox.com>
  • Date: Wed, 20 Jul 2005 18:07:09 +1200
  • Newsgroups: alt.internet.search-engines
  • Organization: Ihug Ltd
  • References: <dbhlub$2o7s$1@godfrey.mcc.ac.uk><dbhmgb$7cc$1@lust.ihug.co.nz> <1121795148.757973.192710@z14g2000cwz.googlegroups.com> <dbjlei$foa$1@lust.ihug.co.nz> <dbki40$1m0f$1@godfrey.mcc.ac.uk>
  • User-agent: KNode/0.9.1
  • Xref: news.mcc.ac.uk alt.internet.search-engines:63735
Roy Schestowitz wrote:

> Chris Hope wrote:
> 
>> Fritz M wrote:
>> 
>>> Chris Hope wrote:
>>> 
>>>> In a lot of cases Google picks up typos when the search is done and
>>>> you get the page asking you if you meant to spell it the correct
>>>> way.
>>> 
>>> In Roy's "Oogle earth" example, though, Google isn't offering
>>> corrections. Interesting.
>> 
>> I was surprised it didn't pick that one up myself. One of the
>> mispellings I did once was "Freebsie" when the brand name is actually
>> "Freesbie". Google still hasn't learnt it's an incorrect spelling
>> yet, and the two sites I have it mispelled on come up within the
>> first few results. The original site I left with the mispelling (hey
>> why not!) and the second one I added it in with a note saying it's
>> often mispelled and had the mispelling. Gets me a bit of traffic each
>> day :)
>>  
>> The funny thing is, when you spell it correctly Google asks if you
>> meant to search on "frisbie"
>> 
>>> I intentionally misspell words on some of my web pages or include
>>> common variations of a word, usually for proper nouns like city
>>> names, restaurants, people and so forth.
> 
> I hadn't realised what Fritz pointed out until he did. However, Google
> corrections are often as naive as one would expect them to be.
> 
> A Google index of valid tokens considers 'Oogle' (whatever it may be)
> to be a valid word. It also considers 'Earth' to be a valid word.
> Finding the correlation between words and proposing corrections based
> on strings of words is computationally a hard task. Google Suggest is
> capable of pairing (or tupling) words based on the number of results,
> but if you introduce this extra dimension of misspellings and consider
> all possible things that can go wrong with spelling, you ask for too
> much. You would not get search results quickly enough OR, if doing it
> off-line, you could have Google spend a lot of computer power
> optimising searches in this way.

Compared with the crunching Google does to return the search queries I
don't think the spelling thing would necessarily be all that difficult.
However, I suspect that the way it works is by using some sort of
simple dictionary lookup.

This is a funny Google search that's doing the rounds at the moment
which illustrates the spelling thing (and of course the real reason it
does the suggestion is because there's no ' in isn't):
  http://www.google.com/search?q=paris+hilton+isnt+a+whore

-- 
Chris Hope | www.electrictoolbox.com | www.linuxcdmall.com

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index