Growing public awareness of environmental hazards has led to an increased demand for public health authorities to investigate geographical clustering of diseases. Although such cluster analysis is nearly always ineffective in identifying causes of disease, it often has to be used to address public concern about environmental hazards. Interpreting the resulting data is not straightforward, however, and this paper presents a guide for the non-specialist. The pitfalls include the fact that cluster analyses are usually done post hoc, and not as a result of a prior hypothesis. This is particularly true for investigations prompted by reported clusters, which have the inherent danger of overestimating the disease rate through "boundary shrinkage" of the population from which the cases are assumed to have arisen. In disease surveillance the problem of making multiple comparisons can be overcome by testing for clustering and autocorrelation. When rates of disease are illustrated in disease maps undue focus on areas where random fluctuation is greatest can be minimised by smoothing techniques. Despite the fact that cluster analyses rarely prove fruitful in identifying causation, they may-like single case reports-have the potential to generate new knowledge.
- Cluster Analysis
- Data Interpretation, Statistical
- Population Surveillance