Assessment Terminology: Key Concepts for Shared Understanding
Habits of Mind
Habits of mind include such things as knowing where to find more information, asking original questions, reflecting on and learning from experience, understanding how to collaborate, and seeking out multiple points of view. These kinds of habits are at the heart of education, but are not easily demonstrated through testing.
High-stakes tests mostly or totally determine significant consequences. Tests are high-stakes for students when promotion from one grade to the next or graduation depends on personal results; tests are high-stakes for schools when teacher pay, funding or control depends on the aggregate results.
Norm-referenced tests rank an individual score against the scores of a group of students who have taken the same test previously and place individuals on a percentile. Often designed to assess students across a wide range of places and ages, norm-referenced tests don't measure school-specific curricula or skills. Norm-referenced tests are handy for broad comparison purposes but don't tell much about a particular student's skills, access to knowledge or habits of mind.
Criterion-referenced tests intend to measure how well a student has learned a specific body of knowledge and set of skills. Criterion-referenced tests tend to score students as “proficient” or “not yet proficient” rather than in percentile rankings. They are designed for relatively rapid assessment via easily scored means (pen and paper, most often).
Performance-based assessments are exhibitions of mastery and skill that require students to construct responses and demonstrate those responses in a variety of ways (through writing, speaking, collaboration, construction, movement, and so on). Because performance-based assessments are complex, they are scored using rubrics indicating levels of performance on a variety of parameters. Evaluation is narrative and not easily reduced to a number or letter grade.
Validity indicates the extent to which evaluators can make accurate judgments about what students know based on their performance on an assessment. A valid assessment measures what students have been charged with knowing; an invalid assessment measures other skills and information and therefore should not be used as the basis for judgment about students' performance.
Reliability /Inter-Rater Reliability
Reliability describes scoring consistency from one assessment to the next. Inter-rater reliability describes scoring consistency from one scorer to the next - that is, that all scorers understand and agree on what constitutes various levels of performance when looking at an essay, a dance, a debate or other exhibition of skill and mastery.