Abstract:
Economists and other social scientists often face situations where they have access to two datasets that they can use but one set of data suffers from censoring or truncation. If the censored sample is much bigger than the uncensored sample, it is common for researchers to use the censored sample alone and attempt to deal with the problem of partial observation in some manner. Alternatively, they simply use only the uncensored sample and ignore the censored one so as to avoid biases. It is rarely the case that researchers use both datasets together, mainly because they lack guidance about how to combine them. In this paper, we develop a simple semiparametric framework for combining the censored and uncensored datasets so that the resulting estimators are consistent, asymptotically normal, and use all information optimally. No nonparametric smoothing is required to implement our estimators. To illustrate our results in an empirical setting, we show how to estimate the effect of changes in compulsory schooling laws on age at first marriage, a variable that is censored for younger individuals. We also demonstrate how refreshment samples for this application can be created by combining cohort information across census datasets. Results from a small simulation experiment suggest that the estimator proposed in this paper can work very well in finite samples.