Abstract
Dataset size continues to increase and data are being collected from numerous applications. Because collecting labeled data is expensive and time consuming, the amount of unlabeled data is increasing. Semi-supervised learning (SSL) has been proposed to improve conventional supervised learning methods by training from both unlabeled and labeled data. In contrast to classification problems, the estimation of labels for unlabeled data presents added uncertainty for regression problems. In this paper, a semi-supervised support vector regression (SS-SVR) method based on self-training is proposed. The proposed method addresses the uncertainty of the estimated labels for unlabeled data. To measure labeling uncertainty, the label distribution of the unlabeled data is estimated with two probabilistic local reconstruction (PLR) models. Then, the training data are generated by oversampling from the unlabeled data and their estimated label distribution. The sampling rate is different based on uncertainty. Finally, expected margin-based pattern selection (EMPS) is employed to reduce training complexity. We verify the proposed method with 30 regression datasets and a real-world problem: virtual metrology (VM) in semiconductor manufacturing. The experiment results show that the proposed method improves the accuracy by 8% compared with conventional supervised SVR, and the training time for the proposed method is 20% shorter than that of the benchmark methods.
Original language | English |
---|---|
Pages (from-to) | 85-106 |
Number of pages | 22 |
Journal | Expert Systems with Applications |
Volume | 51 |
DOIs | |
State | Published - 1 Jun 2016 |
Bibliographical note
Funding Information:This work was supported by Basic Science Research Program through the National Research Foundation of Korea, South Korea (NRF) funded by the Ministry of Science, ICT, & Future Planning (NRF-2014R1A1A1004648).
Publisher Copyright:
© 2015 Elsevier Ltd. All rights reserved.
Keywords
- Data generation
- Probabilistic local reconstruction
- Semi-supervised learning
- Semiconductor manufacturing
- Support vector regression
- Virtual metrology