Comparative Analysis and Application of Imputed Estimators for Population Mean under Stratified Unequal Probability Sampling
With continuously increasing demand for accurate data, the sampling design of surveys has become more and more complex. Unequal probability sampling methods are therefore increasingly used in sample surveys. Item nonresponse is inevitable in survey practice. How to obtain unbiased estimation with data imputation for a complex survey is thus an important issue for research. Previous studies have presented some imputed estimators for equal probability sampling with uniform response. It would be worthwhile to explore the performance of imputed estimators applied to complex surveys, such as unequal probability sampling or different missing data mechanisms. This study aims to present imputed estimators of the population mean for survey data imputed with an auxiliary variable under a stratified unequal probability sampling design, and to compare their performance in terms of different missing data mechanisms and different levels of the correlation coefficient between the auxiliary variable and the variable of interest.
By taking nonresponse and imputation into account, this study derives three imputed estimators (weighted, unweighted, and bias-adjusted imputed estimators) and their corresponding variance estimators with stratified unequal probability sampling, where missing data are imputed by ratio imputation. Six cases under different conditions (missing data mechanisms, population distribution, and sample allocation) are selected for a simulation study to compare the performance of the proposed imputed estimators in terms of relative bias and coefficient of variation. The relative bias of the variance estimators is also studied to compare the performance of the corresponding variance estimators. A practical application is performed to show how to apply the imputed estimators derived in this study to real survey data.
As expected, simulation results show that the performance of the estimators varies depending on the missing data mechanisms, population distributions, and methods of sample allocation. Simulation results indicate that the estimation precision of the imputed estimator increases as the correlation between the auxiliary variable and the variable of interest increases for all three imputed estimators. The imputed estimators perform with greater stability in cases of missing completely at random (MCAR) than in cases of missing at random (MAR).
Comparing the performance among the three imputed estimators, this study shows that in cases of high correlation between the auxiliary variable and the variable of interest, the proposed bias-adjusted estimator works well with stratified unequal probability sampling in reducing the estimation bias and the underestimation of mean square error (MSE) due to unweighted imputation. Moreover, the variance estimator of the bias-adjusted estimator has the smallest relative bias for estimating MSE compared with the two others. The unadjusted imputed estimator with unweighted imputation may cause estimation bias, while its corresponding variance estimators may also underestimate the MSE of the estimator. However, simulation results do not reveal that the bias-adjusted estimator performs better than the imputed estimator with weighted imputation except at a high level of correlation between the auxiliary variable and the variable of interest. In practice, an auxiliary variable which has high correlation with the variable of interest, is commonly used to impute missing values to increase estimation precision. If the survey weights are unavailable and unweighted ratio imputation is used to impute missing values, the proposed bias-adjusted estimator with the corresponding variance estimator is suggested for obtaining a better estimation.