Abstract
Typical personal medical data contains sensitive information about individuals. Storing or sharing the personal medical data is thus often risky. For example, a short DNA sequence can provide information that can identify not only an individual, but also his or her relatives. Nonetheless, most countries and researchers agree on the necessity of collecting personal medical data. This stems from the fact that medical data, including genomic data, are an indispensable resource for further research and development regarding disease prevention and treatment. To prevent personal medical data from being misused, techniques to reliably preserve sensitive information should be developed for real world applications. In this paper, we propose a framework called anonymized generative adversarial networks (AnomiGAN), to preserve the privacy of personal medical data, while also maintaining high prediction performance. We compared our method to state-of-The-Art techniques and observed that our method preserves the same level of privacy as differential privacy (DP) and provides better prediction results. We also observed that there is a trade-off between privacy and prediction results that depends on the degree of preservation of the original data. Here, we provide a mathematical overview of our proposed model and demonstrate its validation using UCI machine learning repository datasets in order to highlight its utility in practice. The code is available at https://github.com/hobae/AnomiGAN/.
Original language | English |
---|---|
Pages (from-to) | 563-574 |
Number of pages | 12 |
Journal | Pacific Symposium on Biocomputing |
Volume | 25 |
Issue number | 2020 |
State | Published - 2020 |
Event | 25th Pacific Symposium on Biocomputing, PSB 2020 - Big Island, United States Duration: 3 Jan 2020 → 7 Jan 2020 |
Bibliographical note
Publisher Copyright:© 2019 The Authors.
Keywords
- Anonymization
- Deep neural networks
- Differential privacy
- Generative adversarial networks