Comparison of 'Traditional' and 'Criterion-Referenced Assessment' of an Essay Question in a Neurological Physiotherapy Module by Rosemary Isles and
Nancy Low ChoyDepartment of Physiotherapy, The University of Queensland
Introduction
Neurological Physiotherapy is taught predominately in the third year of the Physiotherapy course. The modules are part of separate subjects, PT320 and PT313 and teaching is mainly shared by the authors. These modules provide the propositional and procedural knowledge, technical treatment skills and decision-making ( reasoning) processes necessary for applied clinical practice in Neurological Rehabilitation in a fourth year placement. Teaching in practical and case-based problem-solving sessions is largely collaborative.
The aims of the Neurological Physiotherapy course are broadly that, at the completion of the course, the student should:
- Have knowledge of the neuroanatomy, neuropathology and symptomatology related to specified neurological conditions
- Have knowledge of techniques available to treat specified neurological conditions
- Understand the rationale behind the choice of treatment techniques
- Be able to integrate information to solve clinical problems ie. choose appropriate techniques to treat a particular patient case
The final aim of integrating information in clinical reasoning requires development of high-level cognitive skills which are consistent with Level 4 ( Relational) in the SOLO Taxonomy and are considered characteristic of 'deep' learning ( Biggs & Telfer, 1987; Entwhistle & Ramsden, 1983). The cognitive skills are considered by physiotherapist Joy Higgs to include " critical creative, reflective, logical and analytical thinking and metacognition." ( Higgs & Jones, 1995 , P14)
In order to facilitate student acquisition of these appropriate skills and abilities, effective teaching and assessment processes are necessary to promote optimal learning ( Ramsden, 1992). Assessment must test the integration of appropriate knowledge and skills in a meaningful way as it is well recognised that 'assessment drives learning'. Gibbs (1995) states:
"The way in which assessment operates has more effect on student learning than any other aspect of course design"The design of the assessment must therefore effectively test the objectives of the course.
Assessment for the module in these subjects is usually by 'essay' questions and a practical exam that are applied to 'case-based' problems of variable size and complexity. Assessment is held in both semesters but weighted to second semester tasks. As such, it aims to test 'deep' integration of knowledge and application of techniques.
The authors proposed the project as part of a successful Quality Grant by the School of Rehabilitation Science aimed at developing and improving Criterion Referenced Assessment (CRA) procedures in the departments of the school.
Aims of the project
The aims of the project were:
- To develop suitable assessment criteria to mark an essay question in neurological physiotherapy
- To compare traditional and criterion referenced marking methods for the question
- To compare marking results between teachers of the subject, Rosemary Isles (examiner 1) and Nancy Low Choy (examiner 2)
Method
A forty-five minute essay question for PT320 was set to test stated objectives. All papers, once completed, were photocopied to remove names and thus ensure anonymity. The question was graded by Examiner One using 'traditional' methods. This involved papers from all 93 students in the year and results were submitted based on this marking
Fifty (50) papers were photocopied a further three times. Examiner Two marked one set according to 'traditional' methods using a marking scheme established by Examiner 1.
Criteria and standards for marking the question using Criterion Referenced Assessment were established, and at a later time, each examiner marked the sample.
Scores and grades were compared between examiners and between methods using paired T Tests.
Traditional marking
A marking scheme was devised by Examiner 1 which assigned marks in the following way.
Table 1 - Marking scheme for 'Traditional' assessment of essay
TECHNIQUES RATIONALE Traumatic Brain Injury Cerebrovascular Accident Total Common TBI CVA Total Low Tone Trunkal Ataxia Limb Ataxia Movt .Return Balance Proprio- ception Motor Deficit Techn. Motor Deficit Techn. 4 4 4 4 4 4 24 5 2 6 2 6 21 Notes on 'expected' content in each section of techniques were provided. Notes on rationale were less detailed but common understanding was assumed as each examiner was very familiar with the other's lecture content and knowledge.
Total marks (out of 45) were recorded and scores plotted on a histogram by each examiner. From this, grades were assigned according to 'normal' University distribution recommendations. As a result of this, the cutoff for Examiner 2 for a '5' and a '6' grade was two marks higher than Examiner 1. This slightly affected the grade distribution as there were many scores around the grade cutoff mark, although numbers in each grade conformed to University guidelines.
Criterion referenced marking
Criteria were derived based on the objectives and the specific question. These were:
- Knowledge of choice of techniques to manage given patient conditions ( ie. Traumatic Brain Injury and Cerebrovascular Accident) and specific problems.
- Rationale for choice of techniques for given conditions and problems
- Ability to compare / contrast rationale and techniques for given problems
Assessment forms described standards for all levels of performance from grade 0 - 7. Criteria were 'ticked' in the appropriate position in each box ( once for TBI and once for CVA). An 'average' position resulted in each section. ( See Form in Appendix)
Scores were calculated for each grade level so that a comparison could be made with 'traditional' scores. Cut offs for calculation of grades for 4,5,6 and 7 were 50% ,65%, 75% and 85% respectively). The average position was converted to a score in each section. Overall grade level was decided from criteria 'ticks' and recorded. Scores were totalled and recorded.
Results and discussion
There was no significant difference between methods for the whole group for scores or grades. (See Table 2)
Table 2 - Analyses of overall grades and scores using traditional and criterion based assessment
MARKING SCHEME MEAN STAND. DEV 2-TAIL SIGNIF GRADES TRADITIONAL
CRITERIONN = 100 4.77
4.79N = 100 1.19
1.09.765 SCORES TRADITIONAL
CRITERION30.27
30.026.39
6.08.419 The profile for each examiner including mean and SD appears below in Table 2. This demonstrates that Examiner 2 had higher mean scores than Examiner 1.
Table 3 - Mean and SD for scores of each examiner
MARKING SCHEME EXAMINER 1 EXAMINER 2 Mean Stand. Dev. Mean Stand. Dev. TRADITIONAL 29.00 6.88 31.5 5.65 CRITERION 29.91 6.88 30.14 5.21 Table 4 demonstrates that there was no significant difference between examiners for grades using each method but there was a significant difference between examiners for scores by 'traditional' method.
Table 4 - Comparison between examiners according to grade / scores using traditional / criterion based marking schemes
MARKING
SCHEMEEXAMINERS 1 AND 2 GRADES
2-Tailed Significance
N = 50SCORES
2-Tailed Significance
N = 50TRADITIONAL .705 .001* CRITERION .301 .759 Because grades were set traditionally using percentage of students, the lack of difference between grade results was an expected finding. The use of clear criteria and standards for each grade assisted examiners to demonstrate similar grades using the 'criterion' method.
Even though the differences between grades were not statistically significant, there were enough differences between examiners to be of concern. This tended to occur between grades of 5 and 6.
The significant difference between scores for traditional method suggests that the 'marking guide' may not have identified expectations clearly enough. This may have had a greater impact on Examiner 2 as Examiner 1 marked another 43 papers by this method and therefore was more familiar with the marking scheme. Examiners were thus both marking according to their experience of standards in the past and their interpretation of the question. Marking with few marks in several sections may also have made traditional marking more difficult and less reliable.
Table 5 demonstrates that there was no significant difference between methods for grades by each examiner. This would suggest that the identification of standards, one stated and the other internalised allowed each examiner to produce similar grade results. Even though grades were set in the 'traditional' method according to a set percentage, it is probable that the examiners have used this system for so long that their innate 'standards' for each level have been reflected in the stated 'criteria' standards. The criteria actually identified ' the ability to compare and contrast' whereas this was not clear in the 'traditional' method. This may have been expected to cause a change between methods which did not eventuate. It could have contributed to score differences however.
Table 5 - Comparison between grades and scores given according to the examiner
METHOD OF MARKING EXAMINER 1
2-tailed Significance
N = 50EXAMINER 2
2-tailed Significance
n = 50TRADITIONAL VERSUS
CRITERION BASED GRADES.204 .399 TRADITIONAL VERSUS
CRITERION BASED SCORES.033* .001* There was a significant difference between methods for scores by each examiner. This difference in scores compared to grades has resulted because this is a more sensitive measure and therefore prone to greater error in producing an exact score. The 'compare and contrast' criteria discussed above may also have contributed to a more variable result. It is questionable if one examiner could reproduce scores which were not different if they remarked an essay using a complex 'traditional' marking scheme.( ie. The intratester reliability would probably not be high).
Summary of findings
- Criterion Referenced Assessment was easier and quicker to use than traditional assessment. Only the conversion to a score increased the time required in using CRA in this study.
- The fact that the differences between examiners and within examiners results have been more significant in the area of scores compared with grades is important in relation to the use of CRA. It is obviously easier to produce reliable grade scores and CRA in its pure form relies on the use of grades only. It is a disadvantage to have to convert back to a score when aggregation rules are not effective.
- Despite statistical findings, there was an unacceptable level of variation in scores and grades between examiners, especially using traditional marking, had actual marking for this question been shared in the examination.
- It is difficult to set standards for criteria to be used by more than one marker when questions are complex. The chosen question was probably not a good one to use in a formal study. Both criteria and standards must be clear and mean the same thing for examiners and students alike. This suggests that in setting problem-based questions, the complexity should not be so great as to make setting of criteria and standards too difficult. Choice of words in describing standards is vital as they must be specific and not too generalised.
- There is no doubt that experience in marking assists examiners to set their own internalised standards which are then easier to describe. It is difficult for inexperienced markers to establish the standards required for various criteria.
- At times, particular criteria may be 'generic' for a particular form of assessment and useful across departments as a starting point in defining criteria.
- Students were not given 'criteria' and standards for this test. In the future, having criteria will make it easier for students to understand what is required in various assessments in this field.
Conclusion
The study gave useful experience to the examiners in developing and using criteria. It demonstrated that the use of CRA can save time and effort if criteria and standards are established well initially. Assessment, if well designed, does have the ability to test for desired outcomes and as such encourages appropriate 'deep' learning.
Reference List
Biggs J. & Telfer R. (1987) The Process of Learning, Sydney : Prentice-Hall
Entwhistle N. & Ramsden P. (1983) Understanding Student Learning, Croom Helm : London.
Gibbs G. (1995) Assessing Student Centred Courses, Oxford : Oxford Centre for Staff Development.
Higgs J. & Jones M. (1995) Clinical Reasoning in the Health Professions, Oxford : Butterworth-Heinemann.
Ramsden P. (1992) Learning to Teach in Higher Education, London : Routledge.
Conference Entry Page About the Conference Abstracts Conference Program
TEDI Home About Assessment UQ Home
This site built on a Macintosh using Dreamweaver.. Last modified 20/1/99; 10:14:21 AM.