** Next:** Calculating the Entropy

# Errors in Entropy Estimation

### Abstract:

We consider the problem of estimating the overlap between two data
clouds in a high-dimensional hyperspace. In order to do so, we measure
the distance (shuffle or Euclidean) between each possible pairing
of points, where each point corresponds to one data instance encoded
as a vector. A large collection of points shapes the cloud from which
they are sometimes derived in a process that involves generative point
distribution models. For any point in a given cloud *A*, we compute
its distance to each example in another cloud *B* and vice versa.
All these distances can be arranged in the form of a matrix, which
can then be analysed to estimate the level of overlap between these
two clouds of data. The principles of entropic graphs are used to
infer complexity from the relationships embedded in the matrix. In
our attempts to compute entropy, there is a level of *uncertainty*
involved. We are most concerned about the error in the calculation,
but we also consider an error which is the side-effect of repeating
the experiments with multiple and distinct instantiations. Each instantiation,
which is derived using the same stochastic process, tends to entail
slightly different values. The document explains the derivation of
these errors, as well as the calculation of entropy itself.

** Next:** Calculating the Entropy
Roy Schestowitz
2006-04-22