發刊日期/Published Date |
2021年10月
|
---|---|
中英文篇名/Title | 處理具有多個膨脹值之問卷調查資料摘要 Modeling Multiple Inflations in Survey Data |
論文屬性/Type | 研究論文 Article |
作者/Author | |
頁碼/Pagination | 95-111 |
摘要/Abstract | 零膨脹卜瓦松(ZIP)迴歸模型主要用於分析有過多零值的資料。雖然已有許多模型用來處理膨脹資料,但多數模型仍須高度仰賴於資料的獨特性。一般而言, 當受訪者回答了相同的答案,使得資料出現了一些峰值,就產生了膨脹值資料。本文中,我們提出了一個新的多點膨脹截斷卜瓦松迴歸模型(MITP),可以對多個膨脹值進行建模,該模型是多項式邏輯斯模型和截斷卜瓦松迴歸的混合模型,其中多項式邏輯斯模型預測膨脹值的發生與否。截斷卜瓦松迴歸對呈現截斷卜瓦松分配的計數資料進行建模。在實證研究中,我們以國家成人煙草調查(NATS)中的一個問題"您過去30天內有多少天吸煙"為例,資料除了典型的卜瓦松分配天數外(1、2、3、4天等),在5天及7天的倍數天數也有明顯的膨脹值。結果顯示我們的模型比其他競爭模型具有更佳的配適度。 The Poisson (ZIP) regression model is used to analyze data with a Poisson distribution with excessive zeros. Although various models have been developed to fit zero-inflated data, many of them strongly depend on unique features of each data set. To be more specific, this means a sizable group of respondents endorsing the same answers, making the data have modes. For example, some data have cyclical patterns with multiple inflated values, such as survey questions which assess risk or health behaviors within a fixed length of time. Two examples are the question "During the past 30 days, on how many days did symptoms of asthma make it difficult for you to stay asleep?" and the question "During the past two weeks, on how many days did you text or e-mail while driving a car or other vehicle?" In this study, we proposed a new multiple-inflated truncated Poisson (MITP) regression model for more than two inflated values. The model is a combination of multinomial logistic regression and truncated Poisson regression; the multinomial logistic regression models the occurrence of excessive values, and the truncated Poisson regression models data following a truncated Poisson distribution. The performance of the proposed model was evaluated through a simulation study. In the simulation study, we compared the performance of truncated Poisson (TP), zero-inflated truncated Poisson (ZITP), zero- and K-inflated Poisson (ZKIP) and multiple-inflated truncated Poisson (MITP) regression models under different simulation configurations. The factors considered include the model used to generate the data and the sample size. A likelihood ratio test was used to select the best model. We generated 1000 replications for each configuration. The accuracy rates of model selection via the likelihood ratio test and MAE (mean absolute error) were used to compare the performance of the models. In terms of MAE, when the hypothetical true model is TP, the means of the MAE of the four models do not have any substantial difference. When the hypothetical true model is ZITP, TP has the worst performance. When the hypothetical true model is ZKITP, TP and ZITP perform poorly, whereas the performance of ZKITP and MITP is much better. When the hypothetical true model is MITP, MITP has the best performance. From the results, MITP is the best model when there are multiple inflated points. ZKITP and MITP fit well when there are zero and K inflated points, while ZITP, ZKITP and MITP fit the data well when there are only inflated zero counts. When the data are truncated Poisson distributed, all four models fit the data well. With an increasing K-inflation rate, ZKITP and MITP have better and stable performance. With fixed sample sizes and parameters, when the true underlying model is truncated Poisson, MITP has the smallest MAE, followed by ZKITP, ZITP and TP. We analyzed a survey question, "On how many of the PAST 30 DAYS did you smoke cigarettes?" from the US's National Adult Tobacco Survey (NATS) in the empirical study. In addition to typical days of the Poisson distribution (1, 2, 3, and 4 days), the data also have inflated values that are multiples of 5 and 7. The results indicate that the MITP model has smaller MAE, and MSE as well as better model fit, and outperformed competing models. |
關鍵字/Keyword | |
學科分類/Subject | |
主題分類/Theme | |
DOI | |
檔案下載/Download |