Depth Importance in Precision Medicine (DIPM): A Tree and Forest Based Method
- 96 Downloads
We propose the novel implementation of a depth variable importance score in a classification tree method designed for the precision medicine setting. The goal is to identify clinically meaningful subgroups to better inform personalized treatment decisions. In the proposed Depth Importance in Precision Medicine (DIPM) method, a random forest of trees is first constructed at each node. Then, a depth variable importance score is used to select the best split variable. This score makes use of the observation that more important variables tend to be selected closer to root nodes of trees. In particular, we aim to outperform an existing method designed for the analysis of high-dimensional data with continuous outcome variables. The existing method uses an importance score based on weighted misclassification of out-of-bag samples upon permutation. Overall, our method is favorable because of its comparable and sometimes superior performance, simpler importance score, and broader pool of candidate splits. We use simulations to demonstrate the accuracy of our method and apply the method to a clinical dataset.
This work was supported in part by NIH Grants T32MH14235, R01 MH116527, and NSF grant DMS1722544. We thank an anonymous referee for their invaluable comments. The Cancer Cell Line Encyclopedia (CCLE) data used in this article are obtained from the CCLE of the Broad Institute. Their database is available publicly online, and they did not participate in the analysis of the data or the writing of this report.
- 16.Su, X., Tsai, C.L., Wang, H., Nickerson, D.M., Li, B.: Subgroup analysis via recursive partitioning. J. Mach. Learn. Res. 10, 141–158 (2009)Google Scholar