الملخص الإنجليزي
This study aimed to investigate the effect of Items-Wording, their distribution percentage and the number of Likert rating categories on grade 11 female students' Performance and on Scale Psychometric Properties. The measurements were applied on a sample consisting of (450) female students all from grade11(age around 17 years old). The sample was taken from A;Seeb city in Muscat governor in the year 2011\2012 For data collection, a scale of academic self-concept was used after deriving (15) forms from it. The (15) forms have the same content but differ only in the number of Likert rating categories(3,5,7) and the distribution of negative and positive items(all items are positive, 25% of the items are negative, 50% of the items are negative, 75% of the items are negative, all items are negative). The Quasi- Experimental method was used in this study. That is because the subjects were already divided among 15 classes. Therefore, they were taken as they are to represent the 15 experimental groups. The sample was divided among 15 experimental groups. Each group consists of 30 students and sits for one measurement. So, each group sits for a different measurement than the others(15 measurements are distributed among 15 groups). The results revealed a high effect for the number of Likert rating categories) and medium effect for both ( Item-Wording and the distribution of negative and positive items) and the interaction between the two independent variables on the reliability and the performance of the measurements. While no effect was found on the predictive validity of the measurements. Regarding the performance, all 15 means and standard deviations were converted into percentage means and standard deviations. That is to be able to compare between all of them regardless of the number of Likert rating categories( 3\57). It was found out that the( 5 alternatives measurement in which all items are positively worded) showed the highest mean compared to all other measurements. This form's mean was 21.66 degrees higher than the theoretical mean of the 5-pointed scale. The results also revealed a significant mean score differences due to the interaction between the two independent variables with statistical function difference at level of (0,001). The effect size of the interaction was medium according to Cohen categories. The results about reliability revealed high coefficients as they were around (0.860-0.985). The statistical analysis found that the differences between the 15 validity coefficients are significant. That leads to the post-hoc test which reveals 105 comparisons, one of them was not significant. The comparisons showed that the best form is the one that has the highest validity coefficient value and that is the(3-pointed likert scale with 25% of negative items). (see index 12) On the other hand, no significant differences were found between the 15 predictive reliability coefficients. That may be due to the low number of sample in every group (n=30). Or it could be a specific case for the predictive reliability. Nevertheless, 5 of the coefficients were significant at level less than (0.05) while the rest were not. Finally, in the light of the results reached by this study, a number of recommendations are suggested. They concentrate on the importance of the interaction between the (Item-Wording and their distribution) and (the number of Likert rating categories) and its effect on the performance and the Scale Psychometric Properties. This significant interaction reveals many preferable forms but the form (5-pointed likert scale in which all items are positively worded) may be the best. That is because this form achieved good results in all three dependent valuables. It got the highest mean of performance, a high reliability coefficient(0.912) and was one of the 5 validity coefficients that were significant at level less than(0.05).