An American Teacher's View Of British Assessment Practice
There are many calls today for a new national assessment and evaluation system in the United States. Some of the resulting proposals incorporate aspects of the British model. For example, the proposal advanced by the National Center for Education and the Economy includes a place for "calibration" (or "moderation," as it is called in the British scheme). Similarly, the proposal advanced by the National Goals Panel of the National Governors Association calls for curriculum frameworks with the opportunity for local interpretation. I experienced the new British system during a year spent teaching in England.
I offer the following observations and insights in the hope that they will add to the national debate over the desirability of similar reforms for American schools.
During the 1988-89 academic year, I participated in a Fulbright teacher exchange in the United Kingdom. My British comprehensive school (a public school, in the American sense) was in a small town in rural Gloucestershire and could not have been more different from Newton South High School, where I have taught history and social studies for the past nineteen years. Newton is a Boston suburb famous for its commitment to quality education, while England's West Country is known to have a rather backward attitude toward education.
The British education system was in the midst of major reforms in 1988. My British colleagues and I were caught up in the changeover wrought by the Education Act, which mandated a new national curriculum as well as new methods of assessment and evaluation for every state-supported school in England and Wales.
The Basis of Reform
Under the old assessment system, British schools used no grades as we know them, but sent home reports evaluating students' work. My school, for example, sent them to parents once a year (twice for eleven-year-olds who were new to the school). These reports contained a collection of very brief, qualitative statements from each teacher about the student's progress and a simple trifurcated evaluation, placing the student in the high, middle, or low achievement sector of the class. Very strict quotas applied here, with 25 percent being assigned to the high, 25 percent to the low, and the remaining 50 percent to the middle group. This evaluation was completely subjective on the part of the teacher, based on classwork and homework, and its only reference point was the other students in the class (not the entire grade-level). If students did poorly they were not retained, but went on with their classmates until old enough to leave school at age sixteen; such students were (and still are) called "school-leavers."
British schools gave no diplomas or other certificates of attendance. Instead, students took one or two "public examinations." At sixteen, students would "sit" the ordinary level (O-level) exam, after which they could leave school, which approximately 80 percent of them chose to do. If they chose to stay on, they could, at age eighteen, take the advanced (A-level) exam. Acceptance at British universities was dependent on how many O- level and A-level "passes" one had. Most universities required two or three A-levels, but more selective ones demanded not merely passes, but high scores. For example, it would not be unusual for an Oxford college to demand two "A"s and a "B" or better on the A- levels.
This traditional system was roundly criticized in many sectors of society for stressing recall rather than the application of knowledge. Even conservatives accused it of catering to the professional sector of the economy and ignoring the industrial and commercial sectors. Therefore, the Conservative government introduced the first stage of a series of educational reforms in the autumn of 1986. As a package, these reforms were referred to as the National Curriculum.
This National Curriculum mandated that British students take (besides religious education) classes in the following ten areas: English, mathematics, science, technology, history, geography, art, music, physical education, and a modern foreign language. The first three subjects are "core subjects," while the others are "foundation subjects." In addition, a new assessment system replaced the old reports, and a new exam supplanted the O-level: the General Certificate of Secondary Education, or GCSE.
Instead of the traditional high, medium, and low designations used on the old reports, a system of "profiling" was phased in. At the national level, various "assessment objectives" were defined by the Education Ministry. Then each department within a school drew up lists of specific skills for assessment, a sort of hierarchy of student understanding and achievement for each of the nationally determined objectives. Huge amounts of teacher time went into the development of each department's profile. As students "attained" the particular level of understanding, as shown by homework and work in class, the appropriate box was checked by the teacher with the date on which this performance was demonstrated. For example, on my department's profile under the category "Using Information," there was the following range of achievement levels with a space for check-off and date:
- can collect basic information
- can extract and use simple, relevant information to create a description
- can extract and use relevant information to construct a simple explanation
- can select and use relevant information thoroughly and accurately for a specified purpose.
This assessment system was called "formative" rather than "summative." Its purpose was to be diagnostic, and it was to be "criterion-referenced" rather than "norm-referenced," designed to avoid competition among students. However, since the entire National Curriculum was part of a highly charged political program, one must be cautious when hearing British evaluations of the reforms. Tories tend to inflate the system's successes, while the Opposition tends to criticize the reforms, sometimes unfairly and prematurely.
The Thatcher government, much enthralled by things American during the Reagan years, actually billed the new reforms as taking British education closer than had been the case to the American model. Market forces and parent choice were to be primary components of the new system. Parents would chose their children's schools, and money would follow the pupil. Good schools would survive, so the planners argued, while poor schools would wither away. The new assessment system, both the profiles and the GCSE exam, would be skills-oriented rather than content oriented. They would represent, in the words of former education minister Kenneth Baker, a "can-do" system, testing children's competence rather than probing to find what they were not capable of doing. The stigma attached to failure would be removed by giving students grades that ranged from "A" to "G." In all cases, however, a certificate would be granted to students taking the exam. In my school, even teachers who were critical of various aspects of the new exam felt that it was generally an improvement over the elitism of the old O-level, which gave school leavers nothing.
Part of the GCSE grade was determined by student's classwork. Course-work, as it was called, was evaluated by the student's own teachers. Since the British have a strong bias against multiple- choice tests and a great predilection for written responses, machine-graded course-work was not acceptable. The work had to be in essay form, and this presented problems: essay evaluation is highly labor intensive and difficult to standardize. Since the government did not wish to spend much money on this aspect of their program, they developed an ingenious way to get the task done cheaply. Departments within schools were asked to develop their own course-work assignment on their own time (after school in meetings that lasted until early evening). Several "inset" (release) days were given to get the project off the ground, but not enough to complete it. Course-work marks were phased in over several years, becoming a larger and larger part of the GCSE score as teachers learned how to use the new assessment criteria. Ultimately, they counted as much as one-third of the final exam result.
Course-work was then submitted to centrally designated "moderator" teachers who judged its acceptability. Presumably, the Education Ministry had some basis for deciding which teachers understood the national objectives well enough to serve in this capacity. They were awarded a modest stipend for their services, though rank-and-file teachers were paid nothing. Along with each assignment, departments submitted evaluation criteria, or "mark schemes," that tailored the scoring of the specific assignment to the overall measurement of the national assessment criteria. This would show how the scoring of the particular assignment fit into the overall scheme of national assessment objectives. If, for example, four levels of understanding were delineated for a particular question and if ten points was possible as a high score, the following might be the mark scheme on a particular question testing the national assessment objective of dealing with historical evidence:
LEVEL 1: Comprehension of source (1-2 points)
For example, student can extract plausible if only partly relevant information from source, or can extract specific related information from more than one source.
LEVEL 2: Simple interpretation and evaluation (3-4 points)
For example, student can classify type of source, comment on nature and tone of information provided, or can comment on plausibility of source.
LEVEL 3: Supported interpretation/ evaluation of source (5-7 points)
For example, student can evaluate source by general sense of the period, in terms of author's situation or purpose, or by process of cross-referencing; can use source as evidence, not as information.
LEVEL 4: Full appreciation of the nature of historical evidence (8-10 points)
For example, student can recognize that accounts may vary because of the nature of the sources upon which they draw; can identify the relationship of the historian to the object of study; can recognize the nature of historical proof.
Once the course-work with its mark scheme was approved by the moderator and given to students, it was evaluated by the individual teacher. Upon reading an essay, one was encouraged to annotate the work, commenting on where the thresholds for the various levels had been crossed. Another member of the department also read the essays. The department then sat down for an entire day to collectively moderate marks. We tried to iron out discrepancies, citing disagreements with annota-tions about the various levels achieved by the students and compared aspects of one student's essay against those of another, all the time keeping the mark scheme firmly in both hand and mind.
After this arduous but worthwhile process, samples of student essays for each mark level were submitted to the moderator, who determined whether our marks were accurate, too high, or too low. I believe that the moderator had the power to ask for all the exams if necessary. He or she also had the power to add or subtract from all our scores or selected groups of them (e.g., "You were too hard on the low end: add one point each to all scores below six"). Nonetheless, our course-work marks stood as we had submitted them, causing me to wonder how frequently if at all, they were changed in other schools. Being unfamiliar with the system, we tended to be rather tentative and conservative in terms of inflating our students' scores too much. Because we all wished to avoid intervention by the moderator, my department marked the work scrupulously and earnestly.
It is possible that in time, as teachers become more familiar with the system, more games-manship will develop in marking, especially if GCSE scores are tied to evaluations of teacher effective-ness. Still, I found the effort of collegially determining both appropriate level-of-under-standing performances and the degree of student under-standing to be highly worthwhile exercises. It would be beneficial to engage in such a process in the United States if either pay or compensatory time were forthcoming.
Skills vs. Coverage
A disturbing aspect of high school teaching in the United States is the pressure for "coverage." American syllabi put almost everything in terms of the material covered. It is ironic that the closest thing we have to a national evaluation system, the CEEB achievement tests, help to promote this mentality. Coverage becomes necessary in order to prepare students properly for these tests.
It is admirable that Britain's new National Curriculum in general, and the GCSE in particular, are oriented toward the teaching of skills and have made concessions on coverage in order to accomplish this. The examination for which I helped prepare students was "British Social and Economic History since 1750." Though the course included an odd list of topics and themes to be covered, which at first glance seemed intimidat-ing, everyone knew that there would be a wide degree of choice on the examination itself. Much like the classic Chinese menu that asks for two choices from column A and three from column B, the topics would be banded in such a way that only a few needed to be covered. For example, while all students would be required to answer a question on the Industrial Revolution, they could choose the particular industry (e.g., textiles, iron, mining, automobile) they wished to treat.
At my school (and I understand that this was a typical approach throughout Britain), the school made the choice about which questions the students would answer by deciding which topics to teach. Sometimes these choices were based on the teacher's decision to sacrifice content for skills development, but often it was more a question of the books and other resources available to the school. There was a feeling that the students were incapable of covering many of the topics in the two years in which they had to learn the material, and thus their teachers took them over and over the same topics. The result was that the ownership that might have come from students choosing which question to answer was sacrificed so that fewer topics might be covered. I was told some- thing like this: "When our students get to this section of the exam, they will answer the question on the Poor Laws because they haven't learned anything about the Corn Laws or the Chartists." Several of my fifteen-year-olds on their mock exams chose to answer questions on topics they had never studied because "they looked to be interesting," paying a rather high price to restore their right to choose.
While the mindless pursuit of coverage to the exclusion of all else is not good pedagogical practice, neither is spending excessive amounts of time on too narrow a band of selected topics. Are teachers ever convinced that enough time has been devoted to particular favorite topics if coverage pressure disappears? How many American history classes never got beyond the Second World War because the teacher couldn't sacrifice the crucial details of the XYZ affair? Coverage pressure should certainly be reduced, but I fear a situation in which it is totally eliminated. In my British school, the consequence of inadequate coverage was the elimination of student choice of topics on the examination, not to mention the deadening effects of endless repetition.
The other major danger in stressing skills at the expense of coverage is that "skills" can be viewed by some teachers as simply more material to be covered. Teaching skills by "pouring them into students' heads" offers no improvement over imparting coverage in the same manner. Skills must be coached, necessitating the active partici-pation of the learner who must internalize many of the criteria of evaluation. I tried to do this with my GCSE classes, showing them the mark scheme and encouraging them to grade one another's papers, though all but the top few students stayed heavily committed to their passivity. They resisted my efforts to help them evaluate because they were so used to the teacher doing all the evaluation: "This isn't proper work, sir," they whined. In Newton I always work with Advanced Placement students on how to evaluate their own document-based essays, and they generally feel empowered by the experience of reading and utilizing the instructions to the AP graders.
To bolster this point, I should add that since the advent of GCSE, a number of books have come out purporting to teach the various skills required by the exam. These books contain exercises as well as cookbook-like lists of procedures to be used with each kind of GCSE question. My students insisted on copying down and memorizing the lists for the exam. Faced with intense student insistence, I grudgingly acquiesced on occasion. While these books aren't so bad, they remind me of the review books that help students raise their SAT scores. Such raising is obviously not related to authentic skill.
Along these same lines, I noticed my own tendency to "beat the mark scheme." Any exam with this level of standardization must delineate a hierarchical range of skills tied to a range of scores (see my examples for marking course-work). These "mark schemes," or "rubrics" as we would call them here, are often touted by American reformers as having a beneficial impact on curriculum. They assume that skills can be ordinally arranged and that the attainment of a higher skill presupposes the mastery of all lower ones. I disagree. I do not believe that skills are either mastered or not. Instead, they are learned more or less well, with much back-sliding. Yet, by teaching students to hint at the attainment of a higher-level skill without having mastered it, you can help students greatly improve grades without much of an increase in understanding.
For example, one of the most commonly tested skills on the history GCSE is the student's ability to use and evaluate historical evidence. In all cases, the highest-level skill involves an under-standing of the "limitations of evidence." If students exhibit any degree of skepticism about the value or credibility of evidence, they are auto-matically placed in the highest level of response, so I drilled students until we were all blue in the face. My students could be skeptical about evidence even when they could not fully choose, explain, or evaluate it. I even made up a one-page memoriza-tion sheet, filled with mnemonics and catchy phrases that might help them do the few things that would vault them into higher categories than they might have deserved, given their level of understanding. I entitled it "Your Life Raft," and the students seemed to appreciate it. What is worse, we spent a good deal of time on it (time that might have been more productively utilized), and worse still, I think that it helped many of the students improve their exam results. Were I teaching full time in the British system, it would be difficult to resist the urge to continue to use and expand upon such "successful" practices.
Is teaching to a skills-oriented exam better than teaching to a content-based exam? Perhaps it is, but only marginally so. Teaching to an exam is itself part of the problem. It is no better than teaching to achieve high student grades. Teaching for learning is what is required. The exams and the grades must take care of themselves.
It is worthwhile that GCSE forces schools to be deliberate about the teaching of skills. It is also beneficial that skills are not taught in an abstract way, but in conjunction with the coverage of specified material. However, a new exam, without a change in the philosophy of education, is insufficient. A philosophy that demands an active approach to learning is not automatically insured by a new assessment and evaluation system.
The most glaring discrepancy between the stated goals of the new reform system and actual practice lies in the A-level exam. Totally summative in nature, its primary purpose is to select candidates for higher education. It makes no pretense of being criteria-based, is completely norm-referenced, and is as much concerned with "can't do" as it is with "can do." The A-level in history gives students two years to learn in great detail a narrow segment of less than two hundred years of history. It greatly stresses content, though in order to pass with distinction, some understanding is required. Most students commit to memory lists of facts which they can regurgitate on cue. Only rarely do questions ask students to compare lists or to use a list in the service of a more general comparison or evaluation. For example, while a question might inquire about French foreign policy goals in a particular period and another might ask about Spanish foreign policy goals during the same period, there would be no question asking for a comparison of the two countries' policies. The most curious aspect of all this is that the same questions are used year after year. Much of being a good A-level teacher involves being able to predict which of the old questions will be wheeled out for this year's exam. When I asked my department head why students did not need to be able to handle such comparative questions as the French and Spanish question above, he told me that "they just wouldn't ask such a question."
The A-level history examination is document-based: students read and evaluate assigned docu-ments, answering questions about meaning and significance. As with the essay questions, the documents are always on the same topic. My students studied the writings of Martin Luther and the reign of Henry VII. All document questions on the exam would come from those two sources. My students carried around anthologies of Luther documents like little Bibles, memorizing pearls of wisdom from his most famous sermons and polemics. I wondered, however, how well they might understand a historical document that they had not studied in great depth.
Comparisons with the CEEB advanced place-ment exam in American history, which I have graded in the past, are telling. On the CEEB test, the questions are never known in advance. Coverage of most of the course material is important (since the hundred multiple-choice questions can test a range of material), though there is some student choice on essays. Document questions are included, but they are always documents obscure enough that the student has never seen them before. Comparative questions are often asked, forcing students to handle material in an assortment of unusual contexts. Though the memorization of material is a necessary condition for doing well, it is not a sufficient one. Profound and thorough understanding is needed in order earn high grades. Transfer on the part of the student is required. Both breadth of knowledge and depth of understanding are needed. In my opinion, it is a better exam.
In 1989, the British education minister did express an interest in reforming the A-level test, too, but declined to do so until another day. That day has yet to arrive. The reason he gave for not doing so was that British teachers were already reeling from all the change. Presumably out of compassion for poor, overburdened teachers, a short hiatus was granted before the next dizzying round of reforms. I do wonder, however, how sincere the reforming zeal in Britain is without a thorough change in this antiquated exam, the flagship of the British exam system. The reason, I feel, is that this is the exam that "counts." It's at this point that the sons and daughters of all Britain's families gain entrance to university. Reform at the lower levels of education is one thing, but would the government monkey with young Percival's chances of gaining admission to Cambridge? If the British are serious about educational reform, they must show good faith by reforming this most important exam.
British Criticism of GCSE and the National Curriculum
When these reforms were introduced, British employers and parents were unsure what to make of them. There were legitimate questions as to how to interpret the test results and the assessment terminology. Especially difficult to grasp for those used to the old system was how to interpret the "no fail" feature of the exam.
Some argued that the reforms lacked rigor. They argued that no one would be fooled into thinking that grades of "F," "G," or "U" were meaningful "passes." These lower marks had been included originally to remove the stigma of failure. They argued that failure was a natural part of any system. They also argued that there was insufficient room for really outstanding achievement and that the highest-level skills as defined by maximum marks on the GCSE were not high enough to measure true excellence.
In the past year, two important changes in GCSE were introduced to placate these critics. Not every student will be given a certificate after all; the bottom students will be given failing grades (as many as 20 percent according to some estimates). Also, two additional levels were introduced to measure the achievements of the truly high-level students. These two new levels will allow the students from Britain's prestigious private schools to differentiate themselves from all others. Both of these changes will function to reintroduce some of the elitism of the old system. When faced with the irreconcilable demands of those arguing for inclusiveness on the one hand and those arguing for the finer discrimination of abilities on the other, the Tory government showed its priorities.
Even before the introduction of these two changes, issues of equity had cropped up. Private school students, who are used to dealing with open-ended questions and abstract ideas, had a natural advantage even if they didn't fully understand or hadn't covered the exam material. My Gloucester-shire students, on the other hand, lacked the intellectual confidence to respond to open-ended questions even if they understood and had studied the material. They required the proper cues in order to activate the knowledge they had mastered. If they felt the slightest bit uncertain about what the question was driving at, they would not respond at all, thus earning the lowest possible marks. My colleagues complained about this from the beginning of the introduction of GCSE.
Other criticisms of GCSE centered on the method of grading. On any exam with a standardized grading system, there is the danger that highly original or unusual work may be unrewarded or even penalized: mark schemes involve setting parameters that often fail to encompass creative responses. Would GCSE pull the gifted down to some mundane norm? If one is tied to the criteria of a published scheme, what is done with responses that don't fit in neatly? On the AP exam, essays are read at a table with readers collaborating on unusual cases, so that consensus can be achieved to bend the grading plan when necessary. Would the British exam have adequate resources to insure this level of individualization?
Lack of resources to do the job was the most commonly heard criticism made by teachers. Not enough support was given to the training of teachers in the new system. Not enough time was given for implementation. Teachers felt that they were subsidizing change that they had very little part in formulating. They were angry, bewildered, and many were leaving the profession. In spite of serious unemployment in Britain, there is a teacher shortage in many areas.
The answer to the education dilemma in both Britain and the United States is easier to formulate than to implement. It is to recruit excellent teachers and then to give them the resources and the space to work together to improve their schools. Prime Minister Thatcher wanted to fix the archaic British education system without spending much money. The National Curriculum was the result. On the whole, the new system will probably end up being less unfair than the system it replaced, but it is not, in itself, sufficient to improve things substantially. It has significant failings. It has been built on the backs of already overburdened British teachers. Reform of assess-ment and evaluation are only a small part of what is needed in Britain.
My British students were intellectually passive in the extreme. Such passivity will not be remedied by changing assessment and evaluation methods alone. Massive cultural change is required for that. Children need to be listened to, and their opinions elicited. The peer culture of students must be changed to eliminate bullying and putting others down. Teachers must stress techniques that promote deep thought and the formation of opinions. Nothing in the National Curriculum addresses this passivity or its causes.
Involving teachers in the process of assessment and in the evaluation of course-work might have served to empower teachers, but it did not. Before introducing educational reform, the Thatcher government chose to break the power of teaching unions and unilaterally remove collective-bargaining rights. Government ministers could not break themselves of the habit of publicly castigating teachers, earning the profession's hatred and contempt. In light of this, too much ill will existed to enlist teachers in the process of reform. Still, the teachers I worked with struggled in earnest to implement the reforms. However, they should have been paid for their time and effort. Instead, the government simply added days to the school year ("Baker Days") during which teachers were enlisted to do extra work without compensation.
Would such assessment and evaluation proce-dures work in the United States? The construction of criterion-referenced rather than norm-referenced exams with collegial consultation about the eval- uation of level-of-understanding performances would, I believe, help American students increase their understanding. I must add, however, that such a collective effort by American teachers would be very labor-intensive and very expensive. Are conservative reformers prepared to spend new money on such an endeavor, rather than simply redirecting money from other badly needed areas into assessment?
The British assessment and evaluation system is not "the answer." As I have stated, just as students study for the SAT, they would probably cram to boost their scores on any skills-based test that we might devise. With traditional teaching methods, skills might become the new "stuff" that teachers poured into students' heads. What's worse, a belief that such a reform in the assessment system is sufficient to solve the nation's education problems would serve to distract us from the difficult, gritty work we must still do in this nation's schools and classrooms. We must avoid panaceas. Among the things we need are the same things that Britain needs: good teachers working collegially, backed by good resources. Efforts to improve education on the cheap will be a waste of limited money and time and will simply delay the day when we begin that long and upward journey that has no shortcut.
What is more serious, and too often neglected by those engaged in the educational debate, is the shape of the society for which young people are being educated. Newton works better than Gloucestershire, not because of its system of assessment and evaluation, but because the "life chances" of most students attending school here are significantly better than for students there. However, if inner-city U.S. students, like their working- class British counterparts, face little prospect for meaningful jobs and a decent life, then no matter how humane the assessment system, they will examine their options and in their own way say, "No, thanks."
The Coalition of Essential Schools gratefully acknowledges the IBM Corporation and the UPS Foundation for their support of its research on exhibitions.