Modeling Multiple Inflations in Survey Data
The Poisson (ZIP) regression model is used to analyze data with a Poisson distribution with excessive zeros. Although various models have been developed to fit zero-inflated data, many of them strongly depend on unique features of each data set. To be more specific, this means a sizable group of respondents endorsing the same answers, making the data have modes. For example, some data have cyclical patterns with multiple inflated values, such as survey questions which assess risk or health behaviors within a fixed length of time. Two examples are the question "During the past 30 days, on how many days did symptoms of asthma make it difficult for you to stay asleep?" and the question "During the past two weeks, on how many days did you text or e-mail while driving a car or other vehicle?"
In this study, we proposed a new multiple-inflated truncated Poisson (MITP) regression model for more than two inflated values. The model is a combination of multinomial logistic regression and truncated Poisson regression; the multinomial logistic regression models the occurrence of excessive values, and the truncated Poisson regression models data following a truncated Poisson distribution. The performance of the proposed model was evaluated through a simulation study. In the simulation study, we compared the performance of truncated Poisson (TP), zero-inflated truncated Poisson (ZITP), zero- and K-inflated Poisson (ZKIP) and multiple-inflated truncated Poisson (MITP) regression models under different simulation configurations. The factors considered include the model used to generate the data and the sample size. A likelihood ratio test was used to select the best model. We generated 1000 replications for each configuration. The accuracy rates of model selection via the likelihood ratio test and MAE (mean absolute error) were used to compare the performance of the models. In terms of MAE, when the hypothetical true model is TP, the means of the MAE of the four models do not have any substantial difference. When the hypothetical true model is ZITP, TP has the worst performance. When the hypothetical true model is ZKITP, TP and ZITP perform poorly, whereas the performance of ZKITP and MITP is much better. When the hypothetical true model is MITP, MITP has the best performance. From the results, MITP is the best model when there are multiple inflated points. ZKITP and MITP fit well when there are zero and K inflated points, while ZITP, ZKITP and MITP fit the data well when there are only inflated zero counts. When the data are truncated Poisson distributed, all four models fit the data well. With an increasing K-inflation rate, ZKITP and MITP have better and stable performance. With fixed sample sizes and parameters, when the true underlying model is truncated Poisson, MITP has the smallest MAE, followed by ZKITP, ZITP and TP.
We analyzed a survey question, "On how many of the PAST 30 DAYS did you smoke cigarettes?" from the US's National Adult Tobacco Survey (NATS) in the empirical study. In addition to typical days of the Poisson distribution (1, 2, 3, and 4 days), the data also have inflated values that are multiples of 5 and 7. The results indicate that the MITP model has smaller MAE, and MSE as well as better model fit, and outperformed competing models.