Re: Fuzzy Matches for Content Mirrors

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: Fuzzy Matches for Content Mirrors

Subject: Re: Fuzzy Matches for Content Mirrors
From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
Date: Fri, 26 Aug 2005 14:37:55 +0100
Newsgroups: alt.internet.search-engines
Organization: schestowitz.com / Manchester University
References: <dekrj8$47m$1@nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com> <deksi9$2c6l$1@godfrey.mcc.ac.uk> <dekura$crp$1@nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com> <430e1f91$0$18636$14726298@news.sunsite.dk> <dem2vi$2ppa$3@godfrey.mcc.ac.uk> <430ec59c$0$18639$14726298@news.sunsite.dk> <demhct$1438$1@godfrey.mcc.ac.uk> <430eca73$0$18648$14726298@news.sunsite.dk> <demibq$148p$2@godfrey.mcc.ac.uk> <Xns96BE22CE9993Ecastleamber@130.133.1.4> <demk58$14sp$1@godfrey.mcc.ac.uk> <Xns96BE37BFB402Fcastleamber@130.133.1.4>
Reply-to: newsgroups@xxxxxxxxxxxxxxx
User-agent: KNode/0.7.2

__/ On Friday 26 August 2005 11:29, [John Bokma] wrote : \__

>> Sorry, but I must disagree. Let us say that T is the original page and
>> F (false) is the copy.
>> 
>> If F = T + A where A is some extra content, then you have problems
> 
> Not really, you can define similarities based on sentences, words, etc.
> You don't have to look for exact matches. Similar is close enough.


...and very computationally-expensive. Search engines are having a hard time
indexing billions of pages and picking up key words. Now you ask them to
calculate similarities in a graph with billions of nodes?!


> I am sure there has already been a lot of research done. For example,
> students copy papers written by others.


Yes, I know, but people mocked it for being unreliable. Besides, you can
easily run filters that will do some permutations and replace words with
proper equivalents. Brute force would do the job.


>> To a black hat SEO it would be no problem to automate this and deceive
>> the search engines. it is much easier to carry out a robbery than it
>> is for the police to spot the crook in a town of millions.
> 
> You don't do exact matches in cases like this, just fuzzy matches.


Using that analogy again, that's like doing a house-to-house search and
questioning all the residents.

Roy

-- 
Roy S. Schestowitz        Useless fact: Every polar bear is left-handed
http://Schestowitz.com

Follow-Ups:
- Re: Fuzzy Matches for Content Mirrors
  - From: John Bokma

References:
- Re: Great source for content
  - From: Roy Schestowitz
- Re: Great source for content
  - From: T.J.
- Re: CIA Factbook Errors
  - From: Roy Schestowitz
- Re: CIA Factbook Mirrors
  - From: Roy Schestowitz
- Re: CIA Factbook Mirrors
  - From: Roy Schestowitz
- Re: Detecting Content Mirrors
  - From: Roy Schestowitz
- Re: Detecting Content Mirrors
  - From: John Bokma

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index