PDF version of this document

next up previous
Next: Calculating the Entropy

Errors in Entropy Estimation

Roy Schestowitz

Abstract:

We consider the problem of estimating the overlap between two data clouds in a high-dimensional hyperspace. In order to do so, we measure the distance (shuffle or Euclidean) between each possible pairing of points, where each point corresponds to one data instance encoded as a vector. A large collection of points shapes the cloud from which they are sometimes derived in a process that involves generative point distribution models. For any point in a given cloud A, we compute its distance to each example in another cloud B and vice versa. All these distances can be arranged in the form of a matrix, which can then be analysed to estimate the level of overlap between these two clouds of data. The principles of entropic graphs are used to infer complexity from the relationships embedded in the matrix. In our attempts to compute entropy, there is a level of uncertainty involved. We are most concerned about the error in the calculation, but we also consider an error which is the side-effect of repeating the experiments with multiple and distinct instantiations. Each instantiation, which is derived using the same stochastic process, tends to entail slightly different values. The document explains the derivation of these errors, as well as the calculation of entropy itself.




next up previous
Next: Calculating the Entropy
Roy Schestowitz 2006-04-22