计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 371-374.
李鑫,郭汉,张欣,胡方强,帅仁俊
LI Xin, GUO Han,ZHANG Xin,HU Fang-qiang,SHUAI Ren-jun
摘要: 网络在线广告中以套取广告费为目的的点击欺诈检测是机器学习应用的重要内容之一。支持向量机(Support Vector Machine,SVM)是一种优秀的解决二分类和回归问题的机器学习算法,但应用于网络在线广告中的欺诈点击检测时,由于数据集的极端非平衡性,算法性能受到极大的限制。从FDMA2012竞赛欺诈发布商检测的真实数据集出发,在详细研究与对比了3种非平衡数据处理方法后,选取最佳的混合采样方法对原始数据进行处理,再将其应用于SVM分类器。实验结果表明,所提方法能够有效识别实施欺诈点击行为的非法发布商,准确度达到95%左右,满足了网络在线广告中点击欺诈检测的要求。
中图分类号:
[1]ZHANG S,SADAOUI S,MOUHOUB M.An Empirical Analysis of Imbalanced Data Classification[J].Computer & Information Science,2015,8(1):151-162. [2]尹留志.关于非平衡数据特征问题的研究[D].合肥:中国科学技术大学,2014. [3]JIAN C,GAO J,AO Y.A new sampling method for classifying imbalanced data based on support vector machine ensemble[J].Neurocomputing,2016,193(C):115-122. [4]VAPNIK V N.The nature of statistical learning theory [M].New York:Springer Verlag,1995. [5]崔建明.基于SVM算法的文本分类技术研究[J].计算机仿真,2013,30(2):299-302. [6]董亚楠,刘学军,李斌.一种基于用户行为特征选择的点击欺诈检测方法[J].计算机科学,2016,43(10):145-149. [7]OENTARYO R,LIM E P,FINEGOLD M,et al.Detecting click fraud in online advertising:a data mining approach [J].Journal of Machine Learning Research,2014,15(1):99-140. [8]CHAWLA NV,BOWYER KW,HALL LO,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2011,16(1):321-357. [9]GUSTAVO E A,BATISTA P A,RONALDO C,et al.A study of the behavior of several methods for balancing machine lear-ning training data[J].SIGKDD Explorations,2004,6(1):20-29. [10]于化龙,高尚,赵靖,等.基于过采样技术和随机森林的不平衡微阵列数据分类方法研究[J].计算机科学,2012,39(5):190-194. |
[1] | 单晓英, 任迎春. 基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别 Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm 计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216 |
[2] | 陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149 |
[3] | 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真. 一种基于支持向量机的主动度量学习算法 Active Metric Learning Based on Support Vector Machines 计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034 |
[4] | 邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024 |
[5] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
[6] | 黄颖琦, 陈红梅. 基于代价敏感卷积神经网络的非平衡问题混合方法 Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification 计算机科学, 2021, 48(9): 77-85. https://doi.org/10.11896/jsjkx.200900013 |
[7] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[8] | 郭福民, 张华, 胡瑢华, 宋岩. 一种基于表面肌电信号的腕部肌力估计方法研究 Study on Method for Estimating Wrist Muscle Force Based on Surface EMG Signals 计算机科学, 2021, 48(6A): 317-320. https://doi.org/10.11896/jsjkx.200600021 |
[9] | 卓雅倩, 欧博. 噪声环境下的人脸防伪识别算法研究 Face Anti-spoofing Algorithm for Noisy Environment 计算机科学, 2021, 48(6A): 443-447. https://doi.org/10.11896/jsjkx.200900207 |
[10] | 雷剑梅, 曾令秋, 牟洁, 陈立东, 王淙, 柴勇. 基于整车EMC标准测试和机器学习的反向诊断方法 Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning 计算机科学, 2021, 48(6): 190-195. https://doi.org/10.11896/jsjkx.200700204 |
[11] | 刘全明, 李尹楠, 郭婷, 李岩纬. 基于Borderline-SMOTE和双Attention的入侵检测方法 Intrusion Detection Method Based on Borderline-SMOTE and Double Attention 计算机科学, 2021, 48(3): 327-332. https://doi.org/10.11896/jsjkx.200600025 |
[12] | 郇文明, 林海涛. 基于采样集成算法的入侵检测系统设计 Design of Intrusion Detection System Based on Sampling Ensemble Algorithm 计算机科学, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101 |
[13] | 王友卫, 朱晨, 朱建明, 李洋, 凤丽洲, 刘江淳. 基于用户兴趣词典和LSTM的个性化情感分类方法 User Interest Dictionary and LSTM Based Method for Personalized Emotion Classification 计算机科学, 2021, 48(11A): 251-257. https://doi.org/10.11896/jsjkx.201200202 |
[14] | 鲁淑霞, 张振莲. 基于最优间隔的AdaBoostv算法的非平衡数据分类 Imbalanced Data Classification of AdaBoostv Algorithm Based on Optimum Margin 计算机科学, 2021, 48(11): 184-191. https://doi.org/10.11896/jsjkx.200900107 |
[15] | 曹素娥, 杨泽民. 基于聚类分析算法和优化支持向量机的无线网络流量预测 Prediction of Wireless Network Traffic Based on Clustering Analysis and Optimized Support Vector Machine 计算机科学, 2020, 47(8): 319-322. https://doi.org/10.11896/jsjkx.190800075 |
|