To order through AMS contact the AMS Customer Services Department, P.O. Box 6248, Providence, Rhode Island 02940-6248 USA. For Visa, Mastercard, Discover, and American Express orders call 1-800-321-4AMS.
You may also visit the AMS Bookstore and order directly from there. DIMACS does not distribute or sell these books.
Faced with the question of the intended audience for a collection such as the one assembled here, we have been asking ourselves the following questions:
Dave Ozonoff provided to us the following quote by Rothman that sheds some light on possible answers to the questions posed above:
In general terms, epidemiology deals with populations rather than individuals. One of its goals is to study the frequency of occurrences of health related events. It has a major but not exclusive concern with causes and determinants of disease patterns in populations. The premise is that a systematic investigation of different populations can identify causal and preventive factors. Epidemiology is an observational rather than an experimental science. Sample questions take the form of:
We have observed that occurrence measures, causal inference and study designs play prominent roles in the daily endeavors of a typical epidemiologist. Descriptive and analytical epidemiology are two overlapping flavors of this discipline.
Descriptive epidemiology attempts to describe patterns of disease according to spatial and temporal information about the members of a population. These patterns are described by tabulations or summaries of surveys and polls or by parametric or non-parametric population models. Models are in general global descriptions of the major part of a data set. Patterns on the other hand are local features of the data that can be described by association rules, modes or gaps in density functions, outliers, inflection points in regressions, symptom clusters, geographic hot spots, etc. Some epidemiologists appear more interested in local patterns rather than in global structure. This raises questions of how "realistic" certain patterns are.
Analytical Epidemiology attempts to explain and predict the state of a population's health. A typical goal is to summarize the relationship between exposure and disease incidence by comparing two measures of disease frequency. These comparisons may be affected by chance, bias and by the presence or absence of an effect. This explains naturally why statistical methods play a major role in Epidemiology since bias is a central preoccupation of its practitioners. Bias means a systematic error that results in an incorrect or invalid estimate of the measure of association. This can create or mask associations. Selection and information bias are two of the main bias types. In particular, selection shall be independent of exposure if the purpose of the study is to explain the relationship between exposure and disease occurrence. In summary, one of the central themes in analytical epidemiology is to understand the roles of bias, chance and real effect in the understanding of populations health.
To evaluate the role of chance, statistical hypothesis testing and estimation appear to be the tools of choice. On the other hand, generative models offer a way to describe infectious disease dynamics. Since disease patterns are of primary interest, data mining algorithms and detection of rules for pattern formation have a lot to offer. Classification and taxonomies are useful tools to develop predictive models. In general we believe that some questions addressed by epidemiologists benefit from viewing them in a mathematical and algorithmic context. This volume is a first attempt to bridge the gap between the two communities. Its main emphasis is on discrete methods that have successfully addressed some epidemiological question. We begin by providing introductory chapters, on some of the key methods from discrete data mining by a selection of researchers in this area; and on descriptive epidemiology by D. Schneider. These collect, in a digested form, what we believe are among the most potentially useful concepts in data mining and epidemiology.
Next there are two chapters reporting work in epidemiology that suggest a discrete, analytical approach: Shannon on challenges in molecular data analysis, and Hirschman and Damianos on a system for monitoring news wires for indications of disease outbreaks. The remainder of the volume draws out further some of the key areas in the intersection between epidemiology and discrete methods. The technique of formal concept analysis, and the amazing depth of mathematical structure that arises from it is explored in chapters by Ozonoff, Pogel and Hannan, and Abello and Pogel. The dynamics of disease transmission can be modeled in a variety of ways, but often involves setting up systems of differential equations to model the ebb and flow of infection, as demonstrated by Desai, Boily, Masse and Anderson, and Vazquez, in the context of quite different problems. Eubank, Kumar, Marathe, Srinivasan and Wang study massive interaction graphs and give results by a combination of combinatorial methods and simulation; Abello and Capalbo focus on properties of graphs generated by an appropriate random model; while Hartke takes a combinatorial model of disease spread on tree graphs. Finally, we see two applications of Support Vector Machines to epidemiological data sets, from Li, Muchnik and Schneider (using breast cancer data from the SEER database) and from Fradkin, Muchnik, Hermans and Morgan (using data on disease in chickens). Some other potential areas of interest that we have not touched in this collection relate to patient confidentiality, coding and cryptography and multiscale inference.
We hope the volume helps to foster cooperation between epidemiologists, computer scientists and mathematicians. We believe this will help elucidate the main algorithmic and mathematical issues. In a relatively brief period of time we noticed a variety of interconnections between the disciplines, far richer than we ever dreamed of. We trust that the papers included here are a good indicator of the possibilities that discrete mathematical thinking can offer to a variety of epidemiological questions.
James Abello
Graham Cormode
Piscataway, NJ, 2005
Contents Foreword vii Preface ix Acknowledgments xi Selected Data Mining Concepts J. Abello, G. Cormode, D. Fradkin, D. Madigan, O. Melnik, and I. Muchnik 1 Descriptive Epidemiology: A Brief Introduction D. Schneider 41 Biostatistical Challenges in Molecular Data Analysis W.D. Shannon 63 Mining Online Media for Global Disease Outbreak Monitoring L. Hirschman and L.E. Damianos 73 Generalized Contingency Tables and Concept Lattices D. Ozonoff, A. Pogel, and T. Hannan 93 Graph Partitions and Concept Lattices J. Abello and A. Pogel 115 Using Transmission Dynamics Models to Validate Vaccine Efficacy Measures Prior to Conducting HIV Vaccine Efficacy Trials K. Desai, M-C. Boily, B. Masse, and R.M. Anderson 139 Causal Tree of Disease Transmission and The Spreading of Infectious Diseases A. Vazquez 163 Structure of Social Contact Networks and Their Impact on Epidemics S. Eubank, V.S.A. Kumar, M.V. Marathe, A. Srinivasan, and N. Wang 181 Random Graphs (and the Spread of Infections in a Social Network) J. Abello and M. Capalbo 215 Attempting to Narrow the Integrality Gap for the Firefighter Problem on Trees S.G. Hartke 225 Influences on Breast Cancer Survival via SVM Classification in the SEER Database J. Li, I. Muchnik, and D. Schneider 233 Validation of Epidemiological Models: Chicken Epidemiology in the UK D. Fradkin, I. Muchnik, P. Hermans, and K. Morgan 243 Index 257