Jump to content

Mallows'sCp

From Wikipedia, the free encyclopedia

Instatistics,Mallows's,[1][2]named forColin Lingwood Mallows,is used to assess thefitof aregression modelthat has been estimated usingordinary least squares.It is applied in the context ofmodel selection,where a number ofpredictor variablesare available for predicting some outcome, and the goal is to find the best model involving a subset of these predictors. A small value ofmeans that the model is relatively precise.

Mallows'sCphas been shown to be equivalent toAkaike information criterionin the special case of Gaussianlinear regression.[3]

Definition and properties[edit]

Mallows'sCpaddresses the issue ofoverfitting,in which model selection statistics such as the residual sum of squares always get smaller as more variables are added to a model. Thus, if we aim to select the model giving the smallest residual sum of squares, the model including all variables would always be selected. Instead, theCpstatistic calculated on asampleof data estimates thesum squared prediction error(SSPE) as itspopulationtarget

whereis the fitted value from the regression model for theith case,E(Yi|Xi) is the expected value for theith case, and σ2is the error variance (assumed constant across the cases). Themean squared prediction error(MSPE) will not automatically get smaller as more variables are added. The optimum model under this criterion is a compromise influenced by the sample size, theeffect sizesof the different predictors, and the degree ofcollinearitybetween them.

IfPregressorsare selected from a set ofK>P,theCpstatistic for that particular set of regressors is defined as:

where

  • is theerror sum of squaresfor the model withPregressors,
  • Ypiis thepredictedvalue of theith observation ofYfrom thePregressors,
  • S2is the estimation of residuals variance afterregressionon the complete set ofKregressorsand can be estimated by,[4]
  • andNis thesample size.

Alternative definition[edit]

Given a linear model such as:

where:

  • are coefficients for predictor variables
  • represents error

An alternate version ofCpcan also be defined as:[5]

where

  • RSS is the residual sum of squares on a training set of data
  • pis the number of predictors
  • andrefers to an estimate of the variance associated with each response in the linear model (estimated on a model containing all predictors)

Note that this version of theCpdoes not give equivalent values to the earlier version, but the model with the smallestCpfrom this definition will also be the same model with the smallestCpfrom the earlier definition.

Limitations[edit]

TheCpcriterion suffers from two main limitations[6]

  1. theCpapproximation is only valid for large sample size;
  2. theCpcannot handle complex collections of models as in the variable selection (orfeature selection) problem.[6]

Practical use[edit]

TheCpstatistic is often used as a stopping rule for various forms ofstepwise regression.Mallows proposed the statistic as a criterion for selecting among many alternative subset regressions. Under a model not suffering from appreciable lack of fit (bias),Cphas expectation nearly equal toP;otherwise the expectation is roughlyPplus a positive bias term. Nevertheless, even though it has expectation greater than or equal toP,there is nothing to preventCp<Por evenCp< 0 in extreme cases. It is suggested that one should choose a subset that hasCpapproachingP,[7]from above, for a list of subsets ordered by increasingP.In practice, the positive bias can be adjusted for by selecting a model from the ordered list of subsets, such thatCp< 2P.

Since the sample-basedCpstatistic is an estimate of the MSPE, usingCpfor model selection does not completely guard against overfitting. For instance, it is possible that the selected model will be one in which the sampleCpwas a particularly severe underestimate of the MSPE.

Model selection statistics such asCpare generally not used blindly, but rather information about the field of application, the intended use of the model, and any known biases in the data are taken into account in the process of model selection.

See also[edit]

References[edit]

  1. ^Mallows, C. L. (1973). "Some Comments onCP".Technometrics.15(4): 661–675.doi:10.2307/1267380.JSTOR1267380.
  2. ^Gilmour, Steven G. (1996). "The interpretation of Mallows'sCp-statistic ".Journal of the Royal Statistical Society, Series D.45(1): 49–56.JSTOR2348411.
  3. ^Boisbunon, Aurélie; Canu, Stephane; Fourdrinier, Dominique; Strawderman, William; Wells, Martin T. (2013). "AIC,Cpand estimators of loss for elliptically symmetric distributions ".arXiv:1308.2766[math.ST].
  4. ^Mallows, C. L. (1973). "Some Comments onCP".Technometrics.15(4): 661–675.doi:10.2307/1267380.JSTOR1267380.
  5. ^James, Gareth; Witten; Hastie; Tibshirani (2013-06-24).An Introduction to Statistical Learning.Springer.ISBN978-1-4614-7138-7.
  6. ^abGiraud, C. (2015),Introduction to high-dimensional statistics,Chapman & Hall/CRC,ISBN9781482237948
  7. ^Daniel, C.; Wood, F. (1980).Fitting Equations to Data(Rev. ed.). New York: Wiley & Sons, Inc.

Further reading[edit]