PDF version of this document

next up previous
Next: Calculating the Error Up: Errors in Entropy Estimation Previous: Errors in Entropy Estimation

Calculating the Entropy

E ntropy is a measure of uncertainty, which can often be used to assess the complexity among data patterns. In the context of our experiments, entropy is used to estimate the complexity of clouds of data by treating and interpreting them as a connected graphs.

Distance between point in the data clouds form a matrix. That matrix can in turn be evaluated for its complexity using an extended idea that is related to Shannon's entropy. In the context of data clouds, we seek to identify the level of point dispersion, as well as the correlation of that dispersion when two Gaussian distributions are involved.

As a simplistic example, we consider a spherical normal distribution in hyperspace. We consider yet another such distribution and let it gradually drift away from the first. We can then estimate the entropy of the two distributions, the joining of these, and come up with a certain measure of similarity. We observe a well-behaved decrease as the clouds gradually differ, as expected.

The formulation we use to calculate entropy involves the notion of a graph $G$ and and a symbol for entropy, $H$. It also involves the two data clouds, which in the name of simplicity, we shall refer to as $A$ and $B$. For the two clouds, we may assume for the sake of the argument, that we have obtained the distances between all points which they comprise of.

Our estimation of overall entropy is as follow:


\begin{displaymath}
H_{total}=(H(G[A\rightarrow B]-H(G[A\rightarrow A_{sample}])
\end{displaymath} (1)

More latterly, we replaced our antiquated and confusing notation. In practice, we ought to replace $A$ with $S_{o}$, which corresponds to the word ``synthetic''. In our experiments, we tend to deal with synthetic images that are generated from a model of appearance (combining shape and intensity). $A_{sample}$ is also known as $S_{i}$, whose size is arbitrary and can be extended at will. $S_{i}$ used to be merely a subset of the full set $S_{0}$, but it must not be contained in $S_{0}$. It is only derived from the same model as $S_{0}$ so it is not the case that $S_{i}$ $\subseteq S_{0}$. Likewise, and even more strictly, no instance in $S_{i}$ should be contained in $S_{0}$.

We can extend the number of $S_{i}$'s to consider in order to improve our estimations, whenever/if time permits. Ultimately, we are left with a graph which shows the formulation to be rather helpful. The calculation of entropy itself is as follows:


\begin{displaymath}
H(Z_{n})=1/(1-\alpha)[logL_{\gamma}(Z_{n})/n^{\alpha}-const
\end{displaymath} (2)

where $Z_{n}$ is the distribution (of varying density) , $L_{\gamma}$ is the length of the graph, $\alpha$ is a value that lies between 0 and 1 and const is an unimportant constant, at least at this stage.


next up previous
Next: Calculating the Error Up: Errors in Entropy Estimation Previous: Errors in Entropy Estimation
Roy Schestowitz 2006-04-22