Establishing causal relationships in social policy evaluation is important, but difficult due to sample selection. To evaluate the performance of estimators designed to handle sample selection bias we analyse data from a Norwegian rehabilitation project with a randomised experimental design. The data permit us to compare the performance of different nonexperimental estimators with the experimental results. In our case study we find that nonexperimental evaluation based on sample selection estimators with selection terms which fails to meet conventional levels of statistical significance is highly unreliable. The difference in difference estimator and stratification on propensity scores perform better in our context.