The concept of `diversity' has been one of the main open issues in the field of multiple classifier systems. In this paper we address a facet of diversity related to its effectiveness for ensemble construction, namely, explicitly using diversity measures for ensemble construction techniques based on the kind of overproduce and choose strategy known as ensemble pruning. Such a strategy consists of selecting the (hopefully) more accurate subset of classifiers out of an original, larger ensemble. Whereas several existing pruning methods use some combination of individual classifiers' accuracy and diversity, it is still unclear whether such an evaluation function is better than the bare estimate of ensemble accuracy. We empirically investigate this issue by comparing two evaluation functions in the context of ensemble pruning: the estimate of ensemble accuracy, and its linear combination with several well-known diversity measures. This can also be viewed as using diversity as a regularizer, as suggested by some authors. To this aim we use a pruning method based on forward selection, since it allows a direct comparison between different evaluation functions. Experiments on thirty-seven benchmark data sets, four diversity measures and three base classifiers provide evidence that using diversity measures for ensemble pruning can be advantageous over using only ensemble accuracy, and that diversity measures can act as regularizers in this context.