How does VariateTools Work?
VariateTools uses a theory of representation of continuous functions that was originally developed by the Canadian nuclear industry to simulate radionuclide flow from a spent fuel disposal vault [Andres, T.H. 1999. SYVAC3 Manual. AECL Report 10982, Atomic Energy of Canada Ltd., Chalk River, Ontario]. In VariateTools, this theory has been implemented in new Java code to represent cumulative distribution functions.
The cumulative distribution function (cdf) of any probability distribution can be approximated by a piecewise linear function, otherwise known as a linear B-spline. Figure 1 shows a highly blown-up part of a cdf where the piecewise nature of the curve is clearly visible. This figure shows only 1/2000'th of the height of the cdf. The entire curve has 161 points in it, each pair joined by a straight line. An adaptive algorithm is used to choose the right locations for these points so that the approximation is very good with a reasonable number of points.

FIGURE 1: Blown-up part of a cdf showing the piecewise linear form
In VariateTools, it is actually the inverse cdf that is represented by a piecewise linear function, so that the domain of the function is always the same: the interval [0, 1]. That is, if the cdf of a variate X is represented by FX(x), the distribution is represented in VariateTools by the function x=FX–1(u), where u is a variable defined on the interval [0, 1]. The function is represented by a sequence of N+1 points {(uj,xj)}j=0,N. In this sequence, u0=0, uN=1, and both the u's and the x's are monotonically increasing.
If the variate U is uniformly distributed between 0 and 1, the transformation x=FX–1(U) gives values belonging to the distribution of X. It is easy to generate randomly sampled values for X this way by using a pseudo-random generator to give values for U, and then mapping these values through the piecewise linear function being used to approximate the inverse cdf. This fact is only one of several interesting properties of this representation.
Finding the mean value of X is simple: integrate x=FX–1(u) for u varying from 0 to 1. This evaluation can be done analytically if the exact form for the inverse cdf is known, or it can be done approximately using the piecewise linear function and trapezoidal integration between each pair of consecutive points. That is,
∫01FX–1(u) du = ∫ x fX(x) dx = E(X)
The mean value of any function of X can be found in a similar way. For instance, to find the mean value of X2, perform the following operation, either exactly or approximately:
∫01(FX–1(u))2 du = ∫ x2 fX(x) dx = E{X2}
The variance of X can be evaluated from these two equations:
Var{X} = E{X2} – (E{X})2
Operations between random variables are generally done with convolution integrals. For example, if X and Y are two variates, and Z = X + Y, then the cdf FZ(z) can be evaluated from this integral:
FZ(z) = ∫01(FY(z–FX–1(u)) du
Points of the form (z, FZ(z)) generated this way can be rearranged to give (u, FZ–1(u)), and a set of points of this form can be combined to generate a representation for the distribution of Z.
Other operations between two variates may be somewhat more involved than this formula, but the principle is the same. In particular, multiplication and division usually require separate consideration of positive and negative ranges. It is simplest to perform division by defining a reciprocal function for the divisor, and then using multiplication. Similarly, powers of variates can be evaluated by defining logarithm and antilogarithm transformations, and then using products of logarithms to evaluate a power. Any such calculations can be made as accurate as you want, simply by increasing the number of points in each inverse cdf representation.