Feature selection for continuous aggregate response and its application to auto insurance data

Suyeon Kang, Jongwoo Song

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

This paper presents new feature selection algorithms for aggregate data analysis. Data aggregation is commonly used when it is not appropriate to model the relationship between a response and explanatory variables at an individual-level. We investigate substantial challenges in analysis for aggregate data. Then, we propose a groupwise feature selection method that addresses (i) the change in dataset depending on the selection of predictor variables, (ii) the presence of potential missing responses, and (iii) the suitability of model selection criteria when comparing models using different datasets. In application to real auto insurance data, we find a set of important predictors to classify the policyholders into some homogeneous risk groups. Our results clearly demonstrate the potential of the proposed feature selection method for aggregate data analysis in terms of flexibility and computational complexity. We expect that the proposed algorithms would be further applied into a wide range of decision-making tasks using aggregate data as they are applicable to any type of data.

Original languageEnglish
Pages (from-to)104-117
Number of pages14
JournalExpert Systems with Applications
Volume93
DOIs
StatePublished - 1 Mar 2018

Bibliographical note

Funding Information:
The authors are very grateful to the editor and anonymous reviewers for their valuable comments that significantly improve this article. This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea ( NRF-2017R1D1A1B03036078 ).

Publisher Copyright:
© 2017 Elsevier Ltd

Keywords

  • Aggregate data
  • Auto insurance
  • Feature selection
  • Risk assessment
  • Tariff classification

Fingerprint

Dive into the research topics of 'Feature selection for continuous aggregate response and its application to auto insurance data'. Together they form a unique fingerprint.

Cite this