Missing Data Imputation in Balanced Construction for Incomplete Block Designs

Yu, Haiyan; Han, Bing; Rios, Nicholas; Chen, Jianbin

Missing Data Imputation in Balanced Construction for Incomplete Block Designs

Haiyan Yu, Bing Han (), Nicholas Rios and Jianbin Chen ()
Additional contact information
Haiyan Yu: Center for Data and Decision Sciences, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Bing Han: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
Nicholas Rios: Department of Statistics, George Mason University, Fairfax, VA 22031, USA
Jianbin Chen: School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China

Mathematics, 2024, vol. 12, issue 21, 1-22

Abstract: Observational data with massive sample sizes are often distributed on many local machines. From an experimental design perspective, investigators often desire to identify the effect of new treatments (even ML algorithms) on many blocks of experimental data. With time requirements or budget constraints, assigning all treatments to each block is not always feasible. This creates incomplete responses with respect to a randomized complete block design (RCBD). These incomplete responses are missing by design. However, whether they can be estimated with missing imputation methods is not well understood. Thus, it is challenging to correctly identify the treatment effects with missing data. To this end, this paper provides a method for imputation and analysis of the responses with missing data. The proposed method consists of three steps: Reconstruction, Imputation, and ‘Complete’-data Analysis (RICA). The incomplete responses are imputed with the expectation-maximization (EM) algorithm. The RCBD model is then fitted by the resulting dataset. The identifiability result suggests that the missing may be nonignorable for each block, but the whole data of an incomplete design are missing by design when the design is balanced. Theoretical results on relative efficiency also inform us when the missingness should be imputed for incomplete designs with the role of balanced variance. Applications on real-world data verify the efficacy of this method.

Keywords: distributed data; expectation-maximization; incomplete block design; missing by design; machine learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/21/3419/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/21/3419/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:21:p:3419-:d:1511819

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().