English abstract
This study aimed at analyzing and evaluating the examination questions of grade twelve in the subject of pure mathematics in the last four academic years (2005/2006 to 2008/2009). The study aimed at identifying the types of questions, the percentages of the cognitive levels measured by those questions, and the extent to which the questions meet the standards of writing test questions. In addition, the study examined the results of the end of the first semester exam of the 2008/2009 academic year in terms of score reliability, item difficulty, and item discrimination. The sample of the study consisted of math exam questions in four academic years (2005/2006 to 2008/2009). It also included a random sample of 700 twelfth grade students' answer papers of the end of 1st semester math exam of the 2008/2009 academic year. The researcher used an instrument of two parts: 1. A checklist to identify the types of questions included in the math exams of years (2005/2006 to 2008/2009) and the percentages of the cognitive levels measured by the questions. 2. A checklist to examine the extent to which the questions meet the standards of writing test questions. The validity of the instrument was obtained through (17) referees. Inter-rater reliability coefficients were 85% for the first part and 90% for the second part of the instrument. The results of the study indicated that: 1. The questions included in the exams fell into two parts: essay questions and objective questions, with a heavy emphasis on multiple choice questions. 2. The questions concentrated on the following cognitive levels: application (39.2%), comprehension (37.3%), knowledge (19.6%), and higher order thinking levels (3.9%). 3. The questions meet the minimum acceptable level (85%) of the standards of writing test questions. 4. The scores of the end of the first semester exam of the 2008/2009 academic year had an internal consistency coefficient of (0.75) as measured by Cronbach's alpha. The item difficulty levels of the exam ranged between (9%-82%) with an average of 45.5%), and the item discrimination levels ranged between (-0.013-0.52). These results yielded a number of recommendations and suggestions for research and practice which was discussed in the study.