English abstract
The traditional linear regression models are developed assuming that the response variable is independently normally distributed with constant variance. However, in many practical applications we encounter a lot of response variables which are not continuous but either categorical or count variables and these variables do not follow the basic assumptions of ordinary linear regression model. Besides, given the categorical nature of the dependent variable, the regression function cannot be linear and thus we cannot analyze count and categorical data using ordinary regression model. However, under certain transformations all these problems can be overcome by following a unified approach which is referred to as Generalized Linear Model (GLM). The objectives of this project are to provide an overview of modeling count and categorical data using GLMs with their applications to real data from Oman National Reproductive Health Survey 2008 and the Sultan Qaboos University student's performance data.
The basic GLM for count data is the Poisson model with log link. Frequently, count data are often over-dispersed (variance of the response variable greater than the mean) and invalidating the use of the Poisson distribution. In these conditions, some extensions of Poisson model are usually used to deal with over-dispersion, including the Negative Binomial, Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB). The study empirically assessed the robustness of Poisson model and its extensions to overdispersion situations in count data and found that Zero-Inflated Poisson performs better for analyzing over-dispersed count data related to number of children ever-born to women. This study also provides an overview of modeling categorical data using logistic regression model with application to empirical data obtained from the 2008 Oman National Reproductive Health Survey (ONRHS) related to use of contraceptive method. Binary logistic regression models were used to identify the significant predictors of contraceptive use or non-use. Multinomial logistic regression model was used for analyzing data related to more than two categories measured in nominal category and ordinal category.