Advertisement

Boosting and 1-Penalty Methods for High-dimensional Data with Some Applications in Genomics

  • Peter Bühlmann
Conference paper
  • 1.6k Downloads
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

We consider Boosting and 1-penalty (regularization) methods for prediction and model selection (feature selection) and discuss some relations among the approaches. While Boosting has been originally proposed in the machine learning community (Freund and Schapire (1996)), 1-penalization has been developed in numerical analysis and statistics (Tibshirani (1996)). Both of the methods are attractive for very high-dimensional data: they are computationally feasible and statistically consistent (e.g. Bayes risk consistent) even when the number of covariates (predictor variables) p is much larger than sample size n and if the true underlying function (mechanism) is sparse: e.g. we allow for arbitrary polynomial growth p = pn = O(nγ) for any γ > 0. We demonstrate high-dimensional classification, regression and graphical modeling and outline examples from genomic applications.

Keywords

Support Vector Machine Base Procedure Weak Learner Undirected Edge Diagonal Linear Discriminant Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BREIMAN, L. (1998): Arcing classifiers. Ann. Statist., 26, 801–849 (with discussion).zbMATHMathSciNetGoogle Scholar
  2. BÜHLMANN, P. (2004): Boosting for high-dimensional linear models. To appear in the Ann. Statist.Google Scholar
  3. BÜHLMANN, P. and YU, B. (2003): Boosting with the L2loss: regression and classification. J. Amer. Statist. Assoc., 98, 324–339.MathSciNetGoogle Scholar
  4. EFRON, B., HASTIE, T., JOHNSTONE, I. and TIBSHIRANI, R. (2004): Least angle regression. Ann. Statist., 32, 407–499 (with discussion).MathSciNetGoogle Scholar
  5. FREUND, Y. and SCHAPIRE, R.E. (1996): Experiments with a new boosting algorithm. In: Machine Learning: Proc. Thirteenth International Conference. Morgan Kauffman, San Francisco, 148–156.Google Scholar
  6. FRIEDMAN, J.H. (2001): Greedy function approximation: a gradient boosting machine. Ann. Statist., 29, 1189–1232.CrossRefzbMATHMathSciNetGoogle Scholar
  7. FRIEDMAN, J.H., HASTIE, T. and TIBSHIRANI, R. (2000): Additive logistic regression: a statistical view of boosting. Ann. Statist., 28, 337–407 (with discussion).CrossRefMathSciNetGoogle Scholar
  8. GREENSHTEIN, E. and RITOV, Y. (2004): Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli, 10, 971–988.MathSciNetGoogle Scholar
  9. JIANG, W. (2004): Process consistency for AdaBoost. Ann. Statist., 32, 13–29 (disc. pp. 85-134).zbMATHMathSciNetGoogle Scholar
  10. MALLAT, S and ZHANG, Z. (1993): Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Proc., 41, 3397–3415.CrossRefGoogle Scholar
  11. MEINSHAUSEN, N. and BÜHLMANN, P. (2004): High-dimensional graphs and variable selection with the Lasso. To appear in the Ann. Statist.Google Scholar
  12. TIBSHIRANI, R. (1996): Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc., Ser. B, 58, 267–288.zbMATHMathSciNetGoogle Scholar
  13. TUKEY, J.W. (1977): Exploratory data analysis. Addison-Wesley, Reading, MA.Google Scholar
  14. WILLE, A., ZIMMERMANN, P., VRANOVÁ, E., FÜRHOLZ, A., LAULE, O., BLEULER, S., HENNIG, L., PRELIĆ, A., VON ROHR, P., THIELE, L., ZITZLER, E., GRUISSEM, W. and BÜHLMANN, P. (2004): Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology, 5(11) R92, 1–13.CrossRefGoogle Scholar
  15. ZHANG, T. and YU, B. (2005): Boosting with early stopping: convergence and consistency. Ann. Statist., 33, 1538–1579.MathSciNetGoogle Scholar

Copyright information

© Springer Berlin · Heidelberg 2006

Authors and Affiliations

  • Peter Bühlmann
    • 1
  1. 1.Seminar für StatistikETH ZürichZürichSwitzerland

Personalised recommendations