jstats
Class Probability

java.lang.Object
  extended byjstats.Probability

public abstract class Probability
extends java.lang.Object

The Probability class deals with probability calculations. A valid probability must always be a number >= 0.0 and <= 1.0. Most distributions have requirements in order to calculate a valid distribution. A StatisticException will be thrown when a distribution is given arguments that do not meet that distribution's requirements.

Version:
0.2.3
Author:
Justin Scheiber, David Edelstein

Constructor Summary
Probability()
           
 
Method Summary
static double binomialDistribution(double probability, long trials, long successes)
          Calculates a binomial distribution probability.
static double geometricDistribution(double probability, long trial)
          Computes a geometric probability distribution.
static double hyperGeometricDistribution(long population_a, long population_b, long samples, long expected)
          Calculates a hypergeometric distribution.
static boolean isValidProbability(double probability)
          Validates that a double is a valid probability (>= 0.0 and <= 1.0).
static boolean isValidProbabilityDistribution(double[] distribution)
          Identical to the same method with a delta argument, but uses a default delta of 0.0 (i.e., allows no error tolerance).
static boolean isValidProbabilityDistribution(double[] distribution, double delta)
          Validates that an array of doubles meets the criteria for a valid probability distribution (each value is a valid probability, and the distribution adds up to 1.00).
static double multinomialDistribution(long trials, double[] probabilities, long[] outcomes)
          Calculates a multinomial probability distribution.
static double multinomialDistribution(long trials, double[] probabilities, long[] outcomes, double delta)
          This is the same function as the one without the delta argument, but allows specification of margin of error for the sum of all probabilities.
static double multivariateHypergeometricDistribution(long[] populations, long[] expected)
          Calculates a multivariate hypergeometric distribution.
static double negativeHypergeometricDistribution(long population_a, long population_b, long selections, long expected)
          Calculates a negative hypergeometric distribution.
static double poissonApproximation(double probability, long trials, long successes)
          Approximates a binomial probability using a special form of the Poisson Distribution.
static double poissonDistribution(double mean, long successes)
          Computes the probability of a given number of successes given an average number of successes per unit.
protected static void validateProbability(double probability)
          Confirms that the probability passed into the public functions is proper.
protected static void validateProbability(double[] probability)
          Validate an array of probabilities
protected static void validateProbabilityDistribution(double[] distribution)
          Identical to the same method with a delta, but uses a default delta of 0 (i.e., has no error tolerance).
protected static void validateProbabilityDistribution(double[] distribution, double delta)
          Valid a probability distribution (an array of probabilities, each of which must be a valid probability, and the sum of which must add up to 1.00).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Probability

public Probability()
Method Detail

binomialDistribution

public static double binomialDistribution(double probability,
                                          long trials,
                                          long successes)
                                   throws StatisticException

Calculates a binomial distribution probability. Trials must be independent and have a fixed probability for each one.

P(x) = n!/(n-x)!x! * p^x * (1-p)^(n-x)

(Note that for n = 0, x = 0, the probability will always be 1.00.)

Requirements:

probability is a valid probability; trials <= successes

Parameters:
probability - the probability of any one trial succeeding (p)
trials - the number of trials to perform (n)
successes - the number of successes expected as an outcome (x)
Returns:
the probability of the given number of successes out of the given number of trials (P(x))
Throws:
StatisticException - if above requirements are not met

geometricDistribution

public static double geometricDistribution(double probability,
                                           long trial)
                                    throws StatisticException

Computes a geometric probability distribution. The probability of getting the first success on the xth trial is:

P(x) = p(1-p)^(x-1)

Requirements:

probability is valid, trial > 0

Parameters:
probability - the probability of success on any one trial (p)
trial - the trial where the first success is expected (x)
Returns:
the probability of the first success occurring on the given trial (P(x))
Throws:
StatisticException - if above requirements are not met

hyperGeometricDistribution

public static double hyperGeometricDistribution(long population_a,
                                                long population_b,
                                                long samples,
                                                long expected)
                                         throws StatisticException

Calculates a hypergeometric distribution. When sampling n objects without replacement from a population with A objects of one type and B objects of the other type, the probability of getting x objects of type A and n-x objects of type B is:

A!/(A-x)!x! * B!/(B-n+x)!(n-x)! / (A+B)!/(A+B-n)!n!

or

Comb(a, x) * Comb(b, n-x)/Comb(a+b, n)

Another way to express this is with a population of N, consisting of r elements of one type and N-r elements of the other type. In this case the formula is:

Comb(r, x) * Comb(N-r, n-x)/Comb(N, n)

Use n-r and r, respectively, for the arguments population_a and population_b.

Requirements:

All arguments are whole numbers, samples <= population_a + population_b, expected <= samples

Parameters:
population_a - number of objects of type A
population_b - number of objects of type B
samples - number of samples taken
expected - desired number of type A objects
Returns:
the probability of getting x objects of type A
Throws:
StatisticException - if above requirements are not met

negativeHypergeometricDistribution

public static double negativeHypergeometricDistribution(long population_a,
                                                        long population_b,
                                                        long selections,
                                                        long expected)
                                                 throws StatisticException
Calculates a negative hypergeometric distribution. Given a population consisting of A elements of one kind and B elements of another, what is the probability of selecting x samples before selecting n elements of type A? (I.e., when trying to get n elements of type A, what is the probability that the nth element of type A will be selected on the xth sample?)

Comb(n+x-1, n-1) * Comb(N-n-x, a-n)/(N/a)

Requirements:

All arguments are whole numbers, selections > 0, selections >= expected

Note that if selections > population_a or expected > population_b, the probability will always be 0.0.

Parameters:
population_a - number of elements of type A
population_b - number of elements of type B
selections - number of elements of type A to be selected (n)
expected - number of samples before getting a number of A's equal to selections (x)
Returns:
the probability that the nth A will occur on the xth selection
Throws:
StatisticException - if above requirements are not met

multinomialDistribution

public static double multinomialDistribution(long trials,
                                             double[] probabilities,
                                             long[] outcomes)
                                      throws StatisticException

Calculates a multinomial probability distribution. Also known as a multivariate binomial distribution. Given mutually exclusive outcomes O1...Oz, with P(O1)....P(Oz), in n independent trials, the probability of X1 outcomes of O1.. to Xz outcomes of Oz is:

P(x) = n!/(X1!)(X2!)....(Xz!) * P(O1)^X1 * ..... P(Oz)^Xz.

By default, the sum of all probabilities must be exactly 1.00. You may use this method with a delta argument to allow for a degree of difference from 1.00 to account for rounding errors.

Parameters:
trials - number of independent trials (n)
probabilities - array of probabilities (P(O1) to P(Oz))
outcomes - array of expected outcomes (X1....Xz)
Returns:
probability of the expected outcomes (P(x))
Throws:
StatisticException - for invalid probabilities, if the sum all probabilities is not 1.00, if #probabilities != #outcomes,

multinomialDistribution

public static double multinomialDistribution(long trials,
                                             double[] probabilities,
                                             long[] outcomes,
                                             double delta)
                                      throws StatisticException
This is the same function as the one without the delta argument, but allows specification of margin of error for the sum of all probabilities. The delta is the degree to which the sum of all probabilities can differ from 1.00. This allows use of probability values with high decimal places, where the sum, due to Java's rounding, might not be exactly 1.00. Note that the higher the value for delta, the greater the likelihood of an inaccurate calculation. It is recommended delta not be greater than 0.01.

Delta tolerance is not guaranteed to be accurate beyond 6 decimal places (1E-6)

Parameters:
trials - number of independent trials (n)
probabilities - array of probabilities (P(O1) to P(Oz))
outcomes - array of expected outcomes (X1....Xz)
delta - the permissible delta between the sum of probabilities and 1.00
Returns:
probability of the expected outcomes (P(x))
Throws:
StatisticException - for invalid probabilities, if the sum of all probabilities is not 1.00, if trials < 1 or is not equal to the sum of outcomes, if #probabilities != #outcomes

multivariateHypergeometricDistribution

public static double multivariateHypergeometricDistribution(long[] populations,
                                                            long[] expected)
                                                     throws StatisticException

Calculates a multivariate hypergeometric distribution. Given a set of N objects of I different types, with Ri elements for each i (i = 1...I), sampling without replacement n times, the probability of finding Xi objects for each i (i = 1...I) is:

Comb(R1, X1)/Comb(N, n) * ...... Comb(RI, XI)/Comb(N, n)

Parameters:
populations - an array with length = number of different items, each value containing the number of elements of that type (Ri)
expected - an array of expected sample outcomes, each value containing the number of expected items of the corresponding population array element type (Xi) ("n" equals the sum of all outcomes)
Returns:
the probability that n samples from populations will result in a set corresponding to outcomes
Throws:
StatisticException - if the length of the populations and outcomes arrays are not the same, if any population value is 0, if samples <= 0, or if any outcome value is greater than the population of the corresponding item

poissonApproximation

public static double poissonApproximation(double probability,
                                          long trials,
                                          long successes)
                                   throws StatisticException

Approximates a binomial probability using a special form of the Poisson Distribution. Often used when the number of trials is large and the probability is small. The formula is:

P(x) = (np)^x * E^(-np)/x!

Parameters:
probability - the probability of any one trial succeeding (p)
trials - the number of trials to perform (n)
successes - the number of successes expected as an outcome (x)
Returns:
the Poisson Approximation of the probability of the given number of successes out of the given number of trials (P(x))
Throws:
StatisticException - if probability is invalid, or successes > trials, or any parameter is negative

poissonDistribution

public static double poissonDistribution(double mean,
                                         long successes)
                                  throws StatisticException
Computes the probability of a given number of successes given an average number of successes per unit. The formula for the Poisson Distribution is:

P(x) = u^x * e^-x/x!

Where u is the mean number of successes in a given unit.

Parameters:
mean - the average number of successes (u)
successes - the expected number of successes (x)
Returns:
the probability of a given unit resulting in the expected number of success (P(x))
Throws:
StatisticException - if mean <= 0 or successes < 0

isValidProbability

public static boolean isValidProbability(double probability)
Validates that a double is a valid probability (>= 0.0 and <= 1.0).

Parameters:
probability - a double to test
Returns:
true if a valid probability

isValidProbabilityDistribution

public static boolean isValidProbabilityDistribution(double[] distribution,
                                                     double delta)
Validates that an array of doubles meets the criteria for a valid probability distribution (each value is a valid probability, and the distribution adds up to 1.00). A delta may be specified to account for rounding errors in Java when working with doubles; this is the degree to which the sum of the probabilities may vary from 1.0. The greater the delta, the greater the likelihood of an inaccurate calculation.

Parameters:
distribution - an array of doubles to test
delta - the allowable error tolerance
Returns:
true if distribution is a valid probability distribution

isValidProbabilityDistribution

public static boolean isValidProbabilityDistribution(double[] distribution)
Identical to the same method with a delta argument, but uses a default delta of 0.0 (i.e., allows no error tolerance).

Parameters:
distribution - an array of doubles to test
Returns:
true if distribution is a valid probability distribution

validateProbability

protected static void validateProbability(double probability)
                                   throws StatisticException
Confirms that the probability passed into the public functions is proper.

Parameters:
probability - a probability to validate
Throws:
StatisticException - if probability < 0.0 or > 1.0

validateProbability

protected static void validateProbability(double[] probability)
                                   throws StatisticException
Validate an array of probabilities

Parameters:
probability - an array of doubles
Throws:
StatisticException - if any one probability fails to validate

validateProbabilityDistribution

protected static void validateProbabilityDistribution(double[] distribution,
                                                      double delta)
                                               throws StatisticException
Valid a probability distribution (an array of probabilities, each of which must be a valid probability, and the sum of which must add up to 1.00). A delta may be specified allowing a degree of variance from 1.00, since Java's math may introduce small rounding errors in doubles. It is recommended the delta not be greater than 0.01. Note that delta difference is not guaranteed to be accurate beyond 1E-6.

Parameters:
distribution - an array of doubles
delta - the degree to which the sum of the distribution can vary from 1.00.
Throws:
StatisticException - if the distribution does not meet the requirements of a probability distribution

validateProbabilityDistribution

protected static void validateProbabilityDistribution(double[] distribution)
                                               throws StatisticException
Identical to the same method with a delta, but uses a default delta of 0 (i.e., has no error tolerance).

Parameters:
distribution - an array of doubles
Throws:
StatisticException - if the distribution does not meet the requirements of a probability distribution