We investigate the problem of predicting the average effect of a new training program using experiences with previous implementations. There are two principal complications in doing so. First, the population in which the new program will be implemented may differ from the population in which the old program was implemented. Second, the two programs may differ in the mix of their components. With sufficient detail on characteristics of the two populations and sufficient overlap in their distributions, one may be able to adjust for differences due to the first complication. Dealing with the second difficulty requires data on the exact treatments the individuals received. However even in the presence of differences in the mix of components across training programs comparisons of controls in both populations who were excluded from participating in any of the programs should not be affected. To investigate the empirical importance of these issues, we compare four job training pro-grams implemented in the mid-eighties in different parts of the U.S. We find that adjusting for pre-training earnings and individual characteristics removes most of the differences between control units, but that even after such adjustments, post-training earnings for trainees are not comparable. We surmise that differences in treatment components across training programs are the likely cause, and that more details on the specific services provided by these programs are necessary to predict the effect of future programs. We also conclude that effect heterogeneity, it is essential, even in experimental evaluations of training programs record pre-training earnings and individual characteristics in order to render the extrapolation of the results to different locations more credible.