From data quality to model quality: An exploratory study on deep learning

T He, S Yu, Z Wang, J Li, Z Chen - Proceedings of the 11th Asia-Pacific …, 2019 - dl.acm.org
T He, S Yu, Z Wang, J Li, Z Chen
Proceedings of the 11th Asia-Pacific Symposium on Internetware, 2019dl.acm.org
In the field of deep learning, people strive to construct high-quality deep neural networks
(DNNs) to improve the accuracy of predicting. As well known, the quality of training data
have great impacts on the quality of DNN models, since all the DNN models are obtained by
training using these training data. However, there is not any reported systematic study on
how the quality of training data affects the quality of DNN model. To study the relationships
between data quality and model quality, we mainly consider four aspects of data quality …
In the field of deep learning, people strive to construct high-quality deep neural networks (DNNs) to improve the accuracy of predicting. As well known, the quality of training data have great impacts on the quality of DNN models, since all the DNN models are obtained by training using these training data. However, there is not any reported systematic study on how the quality of training data affects the quality of DNN model. To study the relationships between data quality and model quality, we mainly consider four aspects of data quality including Skewed Classes, Sample Complexity, Label Quality, and Noisy Data in this paper. We design experiments on MNIST and Cifar-10, and attempt to find out the influences of four aspects on the quality of DNN models. Pearson correlation coefficient and Spearman correlation coefficient are utilized to evaluate such influences. Experimental results show that all the four aspects of data quality have significant impacts on the quality of DNN models. It means that the decrease of data quality in these four aspects will reduce the accuracy of the DNN models.
ACM Digital Library
Résultat de recherche le plus pertinent Voir tous les résultats