Scale free comparison of distributions

Measures the true diversity contribution of all or part of a distribution

Currently no method available to do this!!

**✳✱***

Scale free comparison

How do we compare a left skewed, a bimodal and a right skewed distribution?

Idea: Compute the number of equiprobable types needed for the same amount of information

Why

Case-based entropy C_c provides a common normalization where it computes the percentage diversity contribution up to a cumulative probability c

**Case based entropy**

**Why?**

The Math

Examples

The Math

Examples

**EmpiricalExamples**

Household Income

The mathematics

For each part of the distribution, up to cumulative probability c, compute the number of equiprobable types required, to maintain the same amount of Shannon information

(

from galaxies to gases

)

Here, we wanted to know what the

diversity of household income

was for the United States in 2012

Here, we have 41 different diversity types; constituting 41 different economic probability states for household income in the United States.

If one explored this distribution using regular statistical terminology, N=41 would be used to compute the mean, median, mode, skew, etc. in relation to the total sample size.

THAT IS NOT WHAT WE ARE DOING!

INSTEAD, We are measuring the

diversity of information in the system, based on a case-based notion of equi-probable types

.

1. As such, the

N = 41 types

is not what we used to compute perfect diversity.

2. Instead, based on the formulas shown in our SIMPLE EXAMPLE, the number of equi-probable diversity types necessary to maintain the same value of Shannon-entropy is

N equi-types = 31.21

.

3. With our

N equi-types

computed, we treat this number as our

denominator

, which 'converts diversity' to percentages. We do this to make our measure scale-free, which allows us to:

(a) compare distributions to one another; and also

(b) compare any part of a distribution to the rest of it, so as to know its percentage contribution to the total diversity.

4. With our denominator determined, we can then compute the diversity contribution for any of our original

N = 41 types

relative to the

N equi-types = 31.21

. For example:

(a) the diversity contribution from Type 1 (Under $5,000) -- which comprises 3.32% of all cases in the sample -- is 3.20%

(b) In turn, the combined diversity contribution from Type 1 to Type 13 ($60,000 to $64,999) -- which comprises 59.32% of all cases in the sample -- is 41.03%.

Figure 1 Part A

Figure 1 Part B

As the comparison of these two

pictures show, the graph for

case-based entropy is very

different from the skewed-right

probability distribution with which

we started.

The

x-axis

represents the

percentage of diversity

explained by any combination

of the the N=41 types relative

to the N =31 equi-types.

The

y-axis

is the cumulative

frequency of cases in the system

of study

In this case, we find that roughly

60% of all cases -- which also

happen to be the smallest

incomes -- account for only 40%

of the total diversity of household

income.

Interestingly enough, we found this

rule to be true for the skewed-right

probability distributions for an

exceptionally wide variety of

complex systems.

Here, we have eight different examples of complex systems. On the left are the skewed-right probability distributions. On the right are the case-based entropy graphs for the same eight systems. It is noteworthy that all eight systems pass through the 60-40 region on the left graph, suggesting that the 60-40 rule plays a role in the diversity of complexity is many systems.

Inspired by our initial findings for the above eight complex systems, we decided to explore further three of the most classic energy distributions in physics to see if case-based entropy could:

(1) effectively map the distribution of diversity of probability states in these systems -- which constitute different forms of energy types, in both discrete and continuous form.

(2) compare the diversity of complexity in these systems to see if the 60/40 rule held sway.

We examined:

(a) The Maxwell Boltzmann Distribution in both one-dimension and three-dimensions

(i.e., MB 1D & MB 3D).

(b) The Boze-Einstein Distribution, for both helium and photons (i.e., BE Helium & BE Photon)

(c) Fermi-Dirac Distribution, for sodium at four different temperatures, Na 6000K, 300K, 1.2K and 15000000K.

NOTE: We postulate that, because fermions do not obey the Pauli Exclusion Principle, Fermi gas does not sufficiently clump toward the lower bound.

https://www.dropbox.com/s/xv8qbmln9ijhl3z/Flower%20Example.xlsx?dl=0

Why equiprobable

Ideal diversity means each probability state or each type occurs with equal probability

If the original distribution is not equiprobable, then we ask how many states or types are required if we have an "equivalent" equiprobable distribution.

The "equivalence" is established by demanding that we have the same amount of Shannon entropy as the original distribution

Hypothetical example

Imagine a garden that has different types of flowers.

Each flower type in general, will have a different frequency of occurence.

In an ideal world, perfect diversity would mean that each type has the SAME frequency of occurence - or each type is equi-probable.

Diversity in the mathematical sense refers not to the number of types of flowers, but to the equiprobable occurence of all types.

If we don't have perfect diversity, then we compute the number of flower types required for an "equivalent" equiprobable distribution that has the same value of Shannon information.

Caveat

We are assuming some order of importance of the flowers (or order of preference). Changing the order will change the diversity contribution if the frequencies are not the same.

**Why**

For us,

case-based entropy

is a measure of the diversity within a complex system (be it physical, biological, psychological, social, ecological, etc) that incorporates both species richness (

the number of cases in a system

) and the evenness of species' abundances (

the diversity of types, be they discrete or continuous

).

SYNONYMS FOR DIVERSITY

Inequality

cultural diversity in society

species diversity in an ecological community

diversity of information in cybernetic systems

diversity of major and minor trends in longitudinal data

diversity of health and wellbeing

trade diversity of a country

network diversity

diversity of complexity in systems