Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples
Brajendra C. Sutradhar ()
Additional contact information
Brajendra C. Sutradhar: Memorial University
Sankhya B: The Indian Journal of Statistics, 2024, vol. 86, issue 1, No 3, 55-90
Abstract:
Abstract In a two stage clusters sampling (TSCS) setup, a sample of clusters is chosen at the first stage from a large number of clusters belonging to a finite population (FP), and in the second stage a random sample of individuals is chosen from the selected cluster. In this sampling setup, it is of interest to collect responses along with certain multi-dimensional fixed covariates from all individuals selected in the second stage cluster, and examine the effects of such covariates on the responses. In some studies, the fixed covariates from the so-called sampling frame consisting of all first-stage clustered individuals may be available. Because the responses in a given cluster share a common random cluster effect, they are correlated. Thus, if the first-stage clusters based data were all available, one could estimate the regression parameters/effects by using the standard infinite population based generalized least square (GLS) approach that produces efficient estimates as compared to the simpler OLS (ordinary least square) estimates. But, in the present TSCS setup, the first-stage clustered data are not available, and hence the estimation has to be done using second-stage clusters, where the responses may not be assumed any more arising from the infinite population, rather there is a sampling effect to consider in order to develop appropriate estimating equations for the regression parameters. However, the existing four decades long studies including a pioneer work by Prasad and Rao (J. Am. Stat. Assoc., 85, 163–171 1990) used the same GLS estimation by treating the second stage clusters as the first stage clusters following a super-population model based correlation structure. In this paper, we revisit this important inference issue and find that because the existing second-stage clusters based GLS approach is constructed ignoring the sampling effect (of the first stage clusters), leave alone the efficiency gain, this approach produces biased and hence inconsistent estimates for the regression parameters and other related subsequent effects. As a remedy, on top of sampling weights we introduce an inverse correlation weight to the second stage clustered elements and provide a doubly weighted GLS (DWGLS) estimation approach which produces unbiased and consistent estimates of the regression parameters. The correlation parameters are also consistently estimated. A numerical illustration using a hypothetical two-stage cluster sample is provided to understand the estimation biases caused by sampling mis-specification under a simpler specialized linear cluster model with no covariates without any loss of generality. For the general regression case, the unbiasedness and consistency properties of the proposed estimator of the regression parameter, which is of main interest, are studied analytically in details. The asymptotic normality of the regression estimator is also studied for the construction of confidence intervals when needed.
Keywords: Design consistency; Generalized least square estimate; Linear mixed effects regression; Population total estimation approach; Survey weighted estimates; Primary 62F10; 62H20; Secondary 62F12 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s13571-023-00321-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sankhb:v:86:y:2024:i:1:d:10.1007_s13571-023-00321-9
Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/13571
DOI: 10.1007/s13571-023-00321-9
Access Statistics for this article
Sankhya B: The Indian Journal of Statistics is currently edited by Dipak Dey
More articles in Sankhya B: The Indian Journal of Statistics from Springer, Indian Statistical Institute
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().