Abstract
Medical datasets are often large and contain sensitive information, presenting significant challenges for data sharing and storage. To address these issues, this paper introduces a novel method called Fused Matching Distillation (FMD), which combines multiple dataset distillation techniques to achieve both data compression and enhanced privacy. FMD synthesizes representative subsets that capture the essential information from the original dataset while anonymizing sensitive details during the distillation process. The proposed approach integrates two complementary methods: parameter matching,a technique that aligns the training trajectories of a teacher network trained on real data with those of a student network trained on synthetic data, and feature distribution matching, which ensures that the synthetic dataset closely approximates the feature distribution of the original data. By fusing these techniques, FMD maximizes the information density within each pixel of the distilled dataset, achieving a balance between compression and performance. Experimental evaluations on medical datasets, including COVID chest X-ray and Pancreas cancer CT, demonstrate that FMD achieves superior accuracy and privacy compared to existing methods. Furthermore, the proposed method is evaluated using metrics for both model performance and anonymity, showing that FMD not only maintains high diagnostic accuracy but also effectively anonymizes the data. This makes FMD a promising tool for secure and efficient medical data sharing. The code is available at the provided link11https://github.com/juheonewha/FMD.git.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 3406-3415 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798331510831 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 - Tucson, United States Duration: 28 Feb 2025 → 4 Mar 2025 |
Publication series
| Name | Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025 |
|---|
Conference
| Conference | 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 |
|---|---|
| Country/Territory | United States |
| City | Tucson |
| Period | 28/02/25 → 4/03/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Keywords
- dataset distillation
- distribution matching
- medical dataset
- parameter matching
- privacy preservation