Comparison of 'Traditional' and 'Criterion-Referenced Assessment' of an Essay Question in a Neurological Physiotherapy Module

by

Rosemary Isles and
Nancy Low Choy

Department of Physiotherapy, The University of Queensland

 

Introduction

Neurological Physiotherapy is taught predominately in the third year of the Physiotherapy course. The modules are part of separate subjects, PT320 and PT313 and teaching is mainly shared by the authors. These modules provide the propositional and procedural knowledge, technical treatment skills and decision-making ( reasoning) processes necessary for applied clinical practice in Neurological Rehabilitation in a fourth year placement. Teaching in practical and case-based problem-solving sessions is largely collaborative.

The aims of the Neurological Physiotherapy course are broadly that, at the completion of the course, the student should:

The final aim of integrating information in clinical reasoning requires development of high-level cognitive skills which are consistent with Level 4 ( Relational) in the SOLO Taxonomy and are considered characteristic of 'deep' learning ( Biggs & Telfer, 1987; Entwhistle & Ramsden, 1983). The cognitive skills are considered by physiotherapist Joy Higgs to include " critical creative, reflective, logical and analytical thinking and metacognition." ( Higgs & Jones, 1995 , P14)

In order to facilitate student acquisition of these appropriate skills and abilities, effective teaching and assessment processes are necessary to promote optimal learning ( Ramsden, 1992). Assessment must test the integration of appropriate knowledge and skills in a meaningful way as it is well recognised that 'assessment drives learning'. Gibbs (1995) states:

"The way in which assessment operates has more effect on student learning than any other aspect of course design"

The design of the assessment must therefore effectively test the objectives of the course.

Assessment for the module in these subjects is usually by 'essay' questions and a practical exam that are applied to 'case-based' problems of variable size and complexity. Assessment is held in both semesters but weighted to second semester tasks. As such, it aims to test 'deep' integration of knowledge and application of techniques.

The authors proposed the project as part of a successful Quality Grant by the School of Rehabilitation Science aimed at developing and improving Criterion Referenced Assessment (CRA) procedures in the departments of the school.

 

Aims of the project

The aims of the project were:

 

Method

A forty-five minute essay question for PT320 was set to test stated objectives. All papers, once completed, were photocopied to remove names and thus ensure anonymity. The question was graded by Examiner One using 'traditional' methods. This involved papers from all 93 students in the year and results were submitted based on this marking

Fifty (50) papers were photocopied a further three times. Examiner Two marked one set according to 'traditional' methods using a marking scheme established by Examiner 1.

Criteria and standards for marking the question using Criterion Referenced Assessment were established, and at a later time, each examiner marked the sample.

Scores and grades were compared between examiners and between methods using paired T Tests.

 

Traditional marking

A marking scheme was devised by Examiner 1 which assigned marks in the following way.

Table 1 - Marking scheme for 'Traditional' assessment of essay

TECHNIQUES RATIONALE
Traumatic Brain Injury Cerebrovascular Accident Total Common TBI CVA Total
Low Tone Trunkal Ataxia Limb Ataxia Movt .Return Balance Proprio- ception     Motor Deficit Techn. Motor Deficit Techn.  
4 4 4 4 4 4 24 5 2 6 2 6 21

Notes on 'expected' content in each section of techniques were provided. Notes on rationale were less detailed but common understanding was assumed as each examiner was very familiar with the other's lecture content and knowledge.

Total marks (out of 45) were recorded and scores plotted on a histogram by each examiner. From this, grades were assigned according to 'normal' University distribution recommendations. As a result of this, the cutoff for Examiner 2 for a '5' and a '6' grade was two marks higher than Examiner 1. This slightly affected the grade distribution as there were many scores around the grade cutoff mark, although numbers in each grade conformed to University guidelines.

 

Criterion referenced marking

Criteria were derived based on the objectives and the specific question. These were:

Assessment forms described standards for all levels of performance from grade 0 - 7. Criteria were 'ticked' in the appropriate position in each box ( once for TBI and once for CVA). An 'average' position resulted in each section. ( See Form in Appendix)

Scores were calculated for each grade level so that a comparison could be made with 'traditional' scores. Cut offs for calculation of grades for 4,5,6 and 7 were 50% ,65%, 75% and 85% respectively). The average position was converted to a score in each section. Overall grade level was decided from criteria 'ticks' and recorded. Scores were totalled and recorded.

 

Results and discussion

There was no significant difference between methods for the whole group for scores or grades. (See Table 2)

Table 2 - Analyses of overall grades and scores using traditional and criterion based assessment

MARKING SCHEME MEAN STAND. DEV 2-TAIL SIGNIF
GRADES

   TRADITIONAL
   CRITERION

N = 100

4.77
4.79

N = 100

1.19
1.09

.765
SCORES

   TRADITIONAL
   CRITERION

30.27
30.02
6.39
6.08
.419

The profile for each examiner including mean and SD appears below in Table 2. This demonstrates that Examiner 2 had higher mean scores than Examiner 1.

Table 3 - Mean and SD for scores of each examiner

MARKING SCHEME EXAMINER 1 EXAMINER 2
Mean Stand. Dev. Mean Stand. Dev.
TRADITIONAL 29.00 6.88 31.5 5.65
CRITERION 29.91 6.88 30.14 5.21

Table 4 demonstrates that there was no significant difference between examiners for grades using each method but there was a significant difference between examiners for scores by 'traditional' method.

Table 4 - Comparison between examiners according to grade / scores using traditional / criterion based marking schemes

MARKING
SCHEME
EXAMINERS 1 AND 2
GRADES
2-Tailed Significance
N = 50
SCORES
2-Tailed Significance
N = 50
TRADITIONAL .705 .001*
CRITERION .301 .759

Because grades were set traditionally using percentage of students, the lack of difference between grade results was an expected finding. The use of clear criteria and standards for each grade assisted examiners to demonstrate similar grades using the 'criterion' method.

Even though the differences between grades were not statistically significant, there were enough differences between examiners to be of concern. This tended to occur between grades of 5 and 6.

The significant difference between scores for traditional method suggests that the 'marking guide' may not have identified expectations clearly enough. This may have had a greater impact on Examiner 2 as Examiner 1 marked another 43 papers by this method and therefore was more familiar with the marking scheme. Examiners were thus both marking according to their experience of standards in the past and their interpretation of the question. Marking with few marks in several sections may also have made traditional marking more difficult and less reliable.

Table 5 demonstrates that there was no significant difference between methods for grades by each examiner. This would suggest that the identification of standards, one stated and the other internalised allowed each examiner to produce similar grade results. Even though grades were set in the 'traditional' method according to a set percentage, it is probable that the examiners have used this system for so long that their innate 'standards' for each level have been reflected in the stated 'criteria' standards. The criteria actually identified ' the ability to compare and contrast' whereas this was not clear in the 'traditional' method. This may have been expected to cause a change between methods which did not eventuate. It could have contributed to score differences however.

Table 5 - Comparison between grades and scores given according to the examiner

METHOD OF MARKING EXAMINER 1
2-tailed Significance
N = 50
EXAMINER 2
2-tailed Significance
n = 50
TRADITIONAL VERSUS
CRITERION BASED GRADES
.204 .399
TRADITIONAL VERSUS
CRITERION BASED SCORES
.033* .001*

There was a significant difference between methods for scores by each examiner. This difference in scores compared to grades has resulted because this is a more sensitive measure and therefore prone to greater error in producing an exact score. The 'compare and contrast' criteria discussed above may also have contributed to a more variable result. It is questionable if one examiner could reproduce scores which were not different if they remarked an essay using a complex 'traditional' marking scheme.( ie. The intratester reliability would probably not be high).

 

Summary of findings

 

Conclusion

The study gave useful experience to the examiners in developing and using criteria. It demonstrated that the use of CRA can save time and effort if criteria and standards are established well initially. Assessment, if well designed, does have the ability to test for desired outcomes and as such encourages appropriate 'deep' learning.

 

Reference List

Biggs J. & Telfer R. (1987) The Process of Learning, Sydney : Prentice-Hall

Entwhistle N. & Ramsden P. (1983) Understanding Student Learning, Croom Helm : London.

Gibbs G. (1995) Assessing Student Centred Courses, Oxford : Oxford Centre for Staff Development.

Higgs J. & Jones M. (1995) Clinical Reasoning in the Health Professions, Oxford : Butterworth-Heinemann.

Ramsden P. (1992) Learning to Teach in Higher Education, London : Routledge.

 

Conference Entry Page About the Conference Abstracts Conference Program

TEDI Home About Assessment UQ Home


This site built on a Macintosh using Dreamweaver.. Last modified 20/1/99; 10:14:21 AM.