计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 362-368.doi: 10.11896/jsjkx.210300032
王美珊, 姚兰, 高福祥, 徐军灿
WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can
摘要: 信息技术和医疗健康信息化的不断发展使医疗数据大规模涌现,为数据分析、数据挖掘、智能诊断等更深层次的应用提供了条件。医疗数据集庞大且涉及大量病人隐私,如何在使用医疗数据的同时保护病人隐私极具挑战性。目前应用于医疗领域的隐私保护技术主要以匿名化技术为主,但当攻击者具有强大的背景知识时,此类方法无法兼顾数据集的隐私性和可用性。因此提出了一种优化分类树算法,并改进了Diffpart分区算法,以数据间关联性为前提,挑选出医疗集值数据集中的适当数据,利用差分隐私保护技术进行加噪处理,满足差分隐私干扰并支持统计查询。最后在24万余条真实医疗数据集上进行测试。实验结果表明,所提算法满足差分隐私分布,并且相比Diffpart算法具备更高的隐私性和效用。
中图分类号:
[1] SWEENEY L.k-anonymity:A model for protecting privacy[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2002,10(5):557-570. [2] SAMARATI P.Protecting respondents’ identities in microdata release[J].IEEE Transactions on Knowledge and Data Engineering,2001,13(6):1010-1027. [3] NARAYANAN A,SHMATIKOV V.Robust de-anonymization of large sparse datasrts[C]//Proceedings of the 2008 IEEE Symposium on Security and Privacy.Oakland,USA,2008:111-125. [4] XIONG P,ZHU T Q,WANG X F.A Survey on Differential Privacy and Applications[J].Chinese Journal of Computers,2014,37(1):101-122. [5] DWORK C.Differential privacy:A survey of results[C]//Proceedings of the 5th International Conference on Theory and Applications of Models of Computation.Xi’an,China,2008:1-19. [6] XIAO X,WANG G,GEHREKE J.Differential privacy viawavelet transforms[C]//Proceedings of the IEEE 26th International Conference on Data Engineering.Piscataway,NJ:IEEE,2010:225-236. [7] HAY M,LI C,MIKLAU G,et al.Accurate estimation of the degree distribution of private networks[C]//Proceedings of the 9th IEEE International Conference on Data Mining.Piscataway,NJ:IEEE,2009:169-178. [8] MCSHERRY F,MIRONOV I.Differentially private recom-mender systems;building privacy into the net[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2009:627-636. [9] CHEN R,MOHAMMED N,FUNG B C M,et al.Publishing set-valued data via differential privacy[J].Proceedings of the VLDB Endowment,2011,4(11):1087-1098. [10] DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Proceedings of the 3rd Conference on Theory of Cryptography.New York,USA,2006:265-284. [11] MCSHERRY F,TALWAR K.Mechanism design via differential privacy[C]//Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science.Providence,Rhode Island,USA,2007:94-103. [12] ABADI M,GOODFELLOW I.Deep learning with differentialprivacy[C]//ACM Sigsac Conference on Computer and Communications Security.ACM,2016:308-318. [13] CAI T T,WANG Y,ZHANG L.The cost of privacy:optimal rates of convergence for parameter estimation with differential privacy[J].arXiv:1902.04495,2019. [14] BEAULIEU-JONES B K,WU Z S,WILLIAMS C,et al.Privacy-preserving generative deep neural networks support clinical data sharing[J].BioRxiv,2017,159756. [15] BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ framework[C]//Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.2005:128-138. [16] DWORK C,NAOR M,PITASSI T,et al.Pan-private streaming algorithms[C]//Proceedings of the 1st Symposium on Innovations in Computer Science.2010. [17] LI Y,HAO Z F,WEN W,et al.Research on differential privacy preserving K-means clustering[J].Computer Science,2013,40(3):287-290. [18] SONG F G,MA T H,TIAN Y,et al.A new method of privacy protection:random k-anonymous[J].IEEE Access,2019,7:75434-75445. [19] SHI X J,HU Y L.Proprietary protection of dynamic set-valued data release based on classification tree[J].Computer Science,2017,44(5):120-124,165. [20] LI S Y,JI X S,YOU W,et al.A data query hierarchical control strategy based on differential privacy[J].Computer Science,2019,46(11):130-136. [21] DONG X M,WANG R,ZOU X K.Survey on Privacy Protection Solutions for Recommended Applications[J].Computer Science,2021,48(9):21-35. [22] CHEN H Y,WANG J H,HU Z P,et al.Dynamic update privacy protection algorithm for medical data publishing[J].Compu-ter Science,2019,46(1):206-211. [23] MCSHERRY F.Privacy integrated queries:An ex- tensible platform for privacy-preserving data analysis[J].Communications of the ACM,2010,53(9):89-97. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 吕由, 吴文渊. 隐私保护线性回归方案与应用 Privacy-preserving Linear Regression Scheme and Its Application 计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190 |
[4] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[5] | 王健. 基于隐私保护的反向传播神经网络学习算法 Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving 计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155 |
[6] | 李利, 何欣, 韩志杰. 群智感知的隐私保护研究综述 Review of Privacy-preserving Mechanisms in Crowdsensing 计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077 |
[7] | 吕由, 吴文渊. 基于同态加密的线性系统求解方案 Linear System Solving Scheme Based on Homomorphic Encryption 计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124 |
[8] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[9] | 金华, 朱靖宇, 王昌达. 视频隐私保护技术综述 Review on Video Privacy Protection 计算机科学, 2022, 49(1): 306-313. https://doi.org/10.11896/jsjkx.201200047 |
[10] | 雷羽潇, 段玉聪. 面向跨模态隐私保护的AI治理法律技术化框架 AI Governance Oriented Legal to Technology Bridging Framework for Cross-modal Privacy Protection 计算机科学, 2021, 48(9): 9-20. https://doi.org/10.11896/jsjkx.201000011 |
[11] | 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述 Survey on Privacy Protection Solutions for Recommended Applications 计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083 |
[12] | 孙林, 平国楼, 叶晓俊. 基于本地化差分隐私的键值数据关联分析 Correlation Analysis for Key-Value Data with Local Differential Privacy 计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122 |
[13] | 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达. 融合语义位置的差分私有位置隐私保护方法 Differentially Private Location Privacy-preserving Scheme withSemantic Location 计算机科学, 2021, 48(8): 300-308. https://doi.org/10.11896/jsjkx.200900198 |
[14] | 陈天荣, 凌捷. 基于特征映射的差分隐私保护机器学习方法 Differential Privacy Protection Machine Learning Method Based on Features Mapping 计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224 |
[15] | 王辉, 朱国宇, 申自浩, 刘琨, 刘沛骞. 基于用户偏好和位置分布的假位置生成方法 Dummy Location Generation Method Based on User Preference and Location Distribution 计算机科学, 2021, 48(7): 164-171. https://doi.org/10.11896/jsjkx.200800069 |
|