Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques

J Am Med Inform Assoc. 2006 Sep-Oct;13(5):516-25. doi: 10.1197/jamia.M2077. Epub 2006 Jun 23.

Authors

Serguei V S Pakhomov¹, James D Buntrock, Christopher G Chute

Affiliation

¹ Division of Biomedical Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA. pakhomov.serguei@mayo.edu

Abstract

Objective: Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.

Methods: We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.

Measurements: Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.

Conclusion: Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.

MeSH terms

Abstracting and Indexing / methods*
Artificial Intelligence*
Disease / classification*
Forms and Records Control / methods*
Humans
International Classification of Diseases
Medical Records Systems, Computerized
Natural Language Processing*
Pilot Projects
User-Computer Interface