vec_math
Class Bootstrap

java.lang.Object
  extended by vec_math.Bootstrap

public class Bootstrap
extends Object

A class to work together with GeneralLinearRegression. If you have a data set and do not know or not care about underlying model, this class helps you to establish confidence limits on your solution. It supports similar constructors as the GeneralLinearRegression class, with the difference that the measurements are set after construct with #setMeasurements. The bootstrapping works in the following way: From the original data set, the basic solution is derived by using GeneralLinearRegression. Then, depending on the simualtion count, a number of simulated data sets are produced by replacing a certain fraction of the original data with duplicates from the origina data set. Each simulated data set is then solved for its linear model parameters. The standard deviation of the linear parameters from all simulated sets acts as the confidence estimate of the original solution. Literature: Numerical Recipes in C, p 691ff.


Nested Class Summary
private static class Bootstrap.ExchangedData
          Helper class containing an exchanged data set.
static class Bootstrap.StraightLine
          We fit data to a straight line.
 
Field Summary
private  NVector average
          The parameter averages on the solution to the simulated data.
private  NVector confidence
          The confidence estimations on the solution to the original data.
private  NVector data
          The original data.
private  NVector[] depend
          Where/how we measured the data.
private  double duplicate
          The fraction of data duplication, defaults to DUPLICATE.
static double DUPLICATE
          The amount of duplicated data in the simulated sets.
private  NVector error
          The errors of the original data.
private  Object[] fit
          The fittings, either an array of functions or multidimensionals.
private  int freedom
          The number of simulated data sets per degree of freedom.
static int FREEDOM
          The number of simulated data sets.
private  NVector maxima
          The parameter maxima on the solution to the simulated data.
private  NVector minima
          The parameter minima on the solution to the simulated data.
private static Random random
          We use a random number generator, constructed at class load.
private  double rmsav
          The measures dropped at bootstrapping are used for a RMS average.
private  double rmsmax
          The measures dropped at bootstrapping are used for a RMS average.
private  double rmsmin
          The measures dropped at bootstrapping are used for a RMS average.
private  boolean usecovariance
          If true, we weigh the simulated parameters with their covariance sigma
static boolean USECOVARIANCE
          The default usage of the covariance parameter estimates.
 
Constructor Summary
  Bootstrap(Function[] fitting, NVector x, NVector y, NVector sigma)
          Constructs a new bootstrap object from the supplied basic functions and the measurement with their errors.
  Bootstrap(Multidimensional[] fitting, NVector[] x, NVector y, NVector sigma)
          Constructs a new bootstrap object from the supplied basic functions and the measurement with their errors.
private Bootstrap(NVector y, NVector sigma)
          Common constructor.
 
Method Summary
private  void doBootstrap()
          The time consuming part of the bootstrap class.
private static double fitDropped(Object[] fitting, NVector solution, NVector[] x, NVector y)
          Fits a data set in y contained in the dependant variables x to the linear model parameters in solution.
 NVector getConfidenceEstimates()
          Returns a confidence estimate on the linear model parameter by bootstrapping the data.
private static Bootstrap.ExchangedData getExchangedData(NVector[] depend, NVector data, NVector err, double duplicate)
          From our measurements and theier errors we derive a simulated data set.
 double getOriginalChiSquare()
          Returns the chi-square value of the original data set.
 QuadMatrix getOriginalCovariance()
          Returns the covariance matrix of the original fit.
 double getOriginalQuality()
          Returns the quality estimate of the original data set.
 NVector getOriginalSigma()
          Returns the standard diviation to the original solution.
 NVector getOriginalSolution()
          Returns the orignal solution to the linear model.
 double getResidualOfDroppedData()
          Returns the average RMS to the bootstrapped data sets remnant.
 double getResidualOfDroppedMaxima()
          Returns the maximum RMS to the bootstrapped data sets remnant.
 double getResidualOfDroppedMinima()
          Returns the minimum RMS to the bootstrapped data sets remnant.
 NVector getSimulatedMaxima()
          Returns the maximas on the linear model parameter from bootstrapping the data.
 NVector getSimulatedMinima()
          Returns the minimas on the linear model parameter from bootstrapping the data.
 NVector getSimulatedSolution()
          Returns the averages on the linear model parameter from bootstrapping the data.
private static GeneralLinearRegression prepareRegression(Object[] fitting, NVector[] x, NVector y, NVector err)
          Prepares a GeneralLinearRegression from the funtions, dependables and measurements passed over.
 void setDuplication(double frac)
          Changes the fraction of duplication to the stated value.
 void setSimulationCount(int simcount)
          Changes the number of simulated data sets per degree of freedom.
 void setUseCovariance(boolean use)
          Sets the useage of the covariance sigma estimate on the fitted parameters on the simulated data sets as the weights in the statistical calculation of the confidence estimates.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DUPLICATE

public static final double DUPLICATE
The amount of duplicated data in the simulated sets.

See Also:
Constant Field Values

FREEDOM

public static final int FREEDOM
The number of simulated data sets.

See Also:
Constant Field Values

USECOVARIANCE

public static final boolean USECOVARIANCE
The default usage of the covariance parameter estimates.

See Also:
Constant Field Values

random

private static final Random random
We use a random number generator, constructed at class load.


depend

private NVector[] depend
Where/how we measured the data. For t-dependencie, lenght=1; Omega=2.


fit

private Object[] fit
The fittings, either an array of functions or multidimensionals.


data

private NVector data
The original data.


error

private NVector error
The errors of the original data. May be null.


usecovariance

private boolean usecovariance
If true, we weigh the simulated parameters with their covariance sigma


average

private NVector average
The parameter averages on the solution to the simulated data.


confidence

private NVector confidence
The confidence estimations on the solution to the original data.


minima

private NVector minima
The parameter minima on the solution to the simulated data.


maxima

private NVector maxima
The parameter maxima on the solution to the simulated data.


rmsav

private double rmsav
The measures dropped at bootstrapping are used for a RMS average.


rmsmin

private double rmsmin
The measures dropped at bootstrapping are used for a RMS average.


rmsmax

private double rmsmax
The measures dropped at bootstrapping are used for a RMS average.


duplicate

private double duplicate
The fraction of data duplication, defaults to DUPLICATE.


freedom

private int freedom
The number of simulated data sets per degree of freedom.

Constructor Detail

Bootstrap

private Bootstrap(NVector y,
                  NVector sigma)
Common constructor. Stores the measurement and its errors. The dependend varaibales depend are set later. The dependencies specifies the measurement in your parameter space, i.e. if you have measurements dependant on time only, your depend array will have a single index, linked to the Vector stating the time of the measurement. If you have data that are linked to pointings on a sphere, depend will have two dimensions, the azimuthal and polar angle of your measurement, etc.


Bootstrap

public Bootstrap(Function[] fitting,
                 NVector x,
                 NVector y,
                 NVector sigma)
Constructs a new bootstrap object from the supplied basic functions and the measurement with their errors.


Bootstrap

public Bootstrap(Multidimensional[] fitting,
                 NVector[] x,
                 NVector y,
                 NVector sigma)
Constructs a new bootstrap object from the supplied basic functions and the measurement with their errors.

Method Detail

setUseCovariance

public void setUseCovariance(boolean use)
Sets the useage of the covariance sigma estimate on the fitted parameters on the simulated data sets as the weights in the statistical calculation of the confidence estimates.


setDuplication

public void setDuplication(double frac)
Changes the fraction of duplication to the stated value. Rejects values outside ]0,1[.


setSimulationCount

public void setSimulationCount(int simcount)
Changes the number of simulated data sets per degree of freedom.


getOriginalSolution

public NVector getOriginalSolution()
Returns the orignal solution to the linear model. Pipes to the underlying general linear regression.


getOriginalSigma

public NVector getOriginalSigma()
Returns the standard diviation to the original solution. These are the diagonal elements of the covariance matrix. Pipes to the underlying general linear regression.


getOriginalCovariance

public QuadMatrix getOriginalCovariance()
Returns the covariance matrix of the original fit.


getOriginalChiSquare

public double getOriginalChiSquare()
Returns the chi-square value of the original data set. Pipes to the underlying general linear regression.


getOriginalQuality

public double getOriginalQuality()
Returns the quality estimate of the original data set. Pipes to the underlying general linear regression.


getConfidenceEstimates

public NVector getConfidenceEstimates()
Returns a confidence estimate on the linear model parameter by bootstrapping the data.

See Also:
doBootstrap()

getSimulatedSolution

public NVector getSimulatedSolution()
Returns the averages on the linear model parameter from bootstrapping the data. Use to compare it to the original solution.

See Also:
doBootstrap()

getSimulatedMinima

public NVector getSimulatedMinima()
Returns the minimas on the linear model parameter from bootstrapping the data. Use to compare it to the original solution.

See Also:
doBootstrap()

getSimulatedMaxima

public NVector getSimulatedMaxima()
Returns the maximas on the linear model parameter from bootstrapping the data. Use to compare it to the original solution.

See Also:
doBootstrap()

getResidualOfDroppedData

public double getResidualOfDroppedData()
Returns the average RMS to the bootstrapped data sets remnant. For each bootstrapping data set, the data points that have been dropped are then fitted to the resulting model. They provide an RMS per bootstrapping model, whose average is returned here.


getResidualOfDroppedMinima

public double getResidualOfDroppedMinima()
Returns the minimum RMS to the bootstrapped data sets remnant. For each bootstrapping data set, the data points that have been dropped are then fitted to the resulting model. They provide an RMS per bootstrapping model, whose minima is returned here.


getResidualOfDroppedMaxima

public double getResidualOfDroppedMaxima()
Returns the maximum RMS to the bootstrapped data sets remnant. For each bootstrapping data set, the data points that have been dropped are then fitted to the resulting model. They provide an RMS per bootstrapping model, whose maxima is returned here.


doBootstrap

private void doBootstrap()
The time consuming part of the bootstrap class. We calculate confidence estimates on the linear model by duplicating measurements of the original data set. The simulation count defines the number of simulated data sets generated. From all data sets returned, we derive confidence estimations by calculating the standard deviation of all simulated set solutions. Additionally, we record the minima and maxima of the simulated data set parameter estimations, as well as their averages. Implementation note: This is a time and memory intense operation.


prepareRegression

private static GeneralLinearRegression prepareRegression(Object[] fitting,
                                                         NVector[] x,
                                                         NVector y,
                                                         NVector err)
Prepares a GeneralLinearRegression from the funtions, dependables and measurements passed over.


fitDropped

private static double fitDropped(Object[] fitting,
                                 NVector solution,
                                 NVector[] x,
                                 NVector y)
Fits a data set in y contained in the dependant variables x to the linear model parameters in solution.

Parameters:
fitting - The regression model as an array of function or multidimensionals.
solution - Vector of linear regression coefficients.
x - The array of dependant variabels.
y - The measurements.
Returns:
A RMS of the measures.

getExchangedData

private static Bootstrap.ExchangedData getExchangedData(NVector[] depend,
                                                        NVector data,
                                                        NVector err,
                                                        double duplicate)
From our measurements and theier errors we derive a simulated data set. This is done as follows: From the original data set, we remain the fraction of 1-duplicate data points. The missing data points are filled in with data from the reduced remaining set, allowing for (arbitrary) multiple duplication of any data point.

Returns:
The simulated data on index zero, the error or null on index 1.