Abstract
This paper presents new feature selection algorithms for aggregate data analysis. Data aggregation is commonly used when it is not appropriate to model the relationship between a response and explanatory variables at an individual-level. We investigate substantial challenges in analysis for aggregate data. Then, we propose a groupwise feature selection method that addresses (i) the change in dataset depending on the selection of predictor variables, (ii) the presence of potential missing responses, and (iii) the suitability of model selection criteria when comparing models using different datasets. In application to real auto insurance data, we find a set of important predictors to classify the policyholders into some homogeneous risk groups. Our results clearly demonstrate the potential of the proposed feature selection method for aggregate data analysis in terms of flexibility and computational complexity. We expect that the proposed algorithms would be further applied into a wide range of decision-making tasks using aggregate data as they are applicable to any type of data.
Original language | English |
---|---|
Pages (from-to) | 104-117 |
Number of pages | 14 |
Journal | Expert Systems with Applications |
Volume | 93 |
DOIs | |
State | Published - 1 Mar 2018 |
Bibliographical note
Funding Information:The authors are very grateful to the editor and anonymous reviewers for their valuable comments that significantly improve this article. This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea ( NRF-2017R1D1A1B03036078 ).
Publisher Copyright:
© 2017 Elsevier Ltd
Keywords
- Aggregate data
- Auto insurance
- Feature selection
- Risk assessment
- Tariff classification